WO2023215371A1

WO2023215371A1 - System and method for perceptually optimized image denoising and restoration

Info

Publication number: WO2023215371A1
Application number: PCT/US2023/020820
Authority: WO
Inventors: Chih-Hsien Chou
Original assignee: Futurewei Technologies, Inc.
Priority date: 2022-05-04
Filing date: 2023-05-03
Publication date: 2023-11-09

Abstract

A system and method for perceptually optimized image denoising and restoration is provided. The system includes a degradation matrix module that generates a characterized input image from an input image and degradation parameters. A perceptually uniform color space (PUCS) conversion circuit generates an input PUCS image from the input image. A perceptual denoiser circuit generates a denoised PUCS image from the input PUCS image. An inverse PUCS conversion circuit generates an output image from the denoised PUCS image. The input PUCS image may be converted to an angular-frequency image before denoising and back to a pixel-domain image after denoising. A use case profile control circuit may provide conversion parameters to circuits of the system. The input image may be value scaled before denoising and value descaled after denoising. A strength control circuit may provide strength control parameters for value scaling and value rescaling.

Description

SYSTEM AND METHOD FOR PERCEPTUALLY OPTIMIZED IMAGE DENOISING AND RESTORATION

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This patent application claims the benefit of U.S. Provisional Patent Application No. 63/338,301 filed May 4, 2022, by Futurewei Technologies, Inc., and titled “System and Methods for Machine Learning-Based Image Denoising and Restoration,” which is hereby incorporated by reference.

TECHNICAL FIELD

[0002] The present disclosure is generally related to image denoising and restoration and specifically to denoising and restoration in a perceptually uniform color space.

BACKGROUND

[0003] Image denoising involves recovering a signal from measurements that may have been corrupted by noise. Image denoising may involve, either explicitly or implicitly, quantifying and characterizing the differences between image signals and noises. Applications for image denoising may include natural images (e.g., generated from visible light or other electromagnetic radiation) and artifactual images (e.g., generated by magnetic resonance imaging (MRI), positron emission tomography (PET), or other imaging systems) for both photographic systems (where the denoised images are targeted to be seen by human eyes) and computer vision systems (where the denoised images are targeted to be analyzed by computer vision algorithms). Image denoising may improve a perceived image quality for photographic systems and an accuracy and robustness of computer vision systems. SUMMARY

[0004] A first aspect relates to a perceptual image denoising system, comprising a perceptually uniform color space (PUCS) conversion circuit, configured to receive an input image in a first image format and generate an input PUCS image in a PUCS image format; a perceptual denoiser circuit, configured to receive the input PUCS image and generate a denoised PUCS image in the PUCS image format, the denoised PUCS image having less image noise than the input PUCS image; and an inverse PUCS conversion circuit, configured to receive the denoised PUCS image and generate an output image in the first image format.

[0005] Optionally, in any of the preceding aspects, another implementation of the aspect further includes a pixel-to-angle domain conversion circuit, configured to receive the input PUCS image and generate an input PUCS angular-frequency (AF) image in an AF domain, wherein the perceptual denoiser circuit is further configured to receive the input PUCS AF image and generate a denoised PUCS AF image; and an angle-to-pixel domain conversion circuit, configured to receive the denoised PUCS AF image and generate the denoised PUCS image.

[0006] Optionally, in any of the preceding aspects, another implementation of the aspect further includes a use case profile control circuit configured to provide conversion parameters to one or more of the PUCS conversion circuit, the pixel-to-angle domain conversion circuit, the angle-to- pixel domain conversion circuit, and the inverse PUCS conversion circuit.

[0007] Optionally, in any of the preceding aspects, another implementation of the aspect further includes a strength circuit configured to apply a transform function to the input image and to generate a value-scaled image, wherein the PUCS conversion circuit is further configured to receive the value-scaled image; and an inverse strength circuit configured to apply an inverse transform function to the output image and to generate a value-descaled output image. [0008] Optionally, in any of the preceding aspects, another implementation of the aspect further includes a strength control circuit configured to provide one or more strength control parameters to the strength circuit and the inverse strength circuit.

[0009] Optionally, in any of the preceding aspects, another implementation of the aspect provides the one or more strength control parameters comprise transform curves and/or inputoutput mixing factors.

[0010] Optionally, in any of the preceding aspects, another implementation of the aspect provides the denoiser circuit comprises a convolutional neural network (CNN).

[0011] Optionally, in any of the preceding aspects, another implementation of the aspect provides the CNN comprises one of a Recurrent CNN, a UNet, and a DenseNet.

[0012] Optionally, in any of the preceding aspects, another implementation of the aspect provides the CNN uses a bias-free CNN architecture, where all biases are removed or set to zero. [0013] Optionally, in any of the preceding aspects, another implementation of the aspect provides the denoiser circuit comprises a trained blind Gaussian denoiser.

[0014] Optionally, in any of the preceding aspects, another implementation of the aspect provides the denoiser circuit is configured to perform an iterative image restoration process until a termination criterion is detected.

[0015] Optionally, in any of the preceding aspects, another implementation of the aspect provides the termination criterion is a threshold difference between a current perceptually denoised image and a previous perceptually denoised image.

[0016] A second aspect relates to an image restoration system. The system includes a degradation matrix module configured to receive an input image and degradation parameters and generate a characterized input image; and a perceptual image denoising system according to any of the preceding claims, configured to receive the characterized input image and generate a perceptually denoised image.

[0017] Optionally, in any of the preceding aspects, another implementation of the aspect provides the degradation parameters are one of estimated degradation parameters and assigned degradation parameters.

[0018] Optionally, in any of the preceding aspects, another implementation of the aspect provides the assigned degradation parameters are precalculated parameters for characterizing deterministic corruptions.

[0019] Optionally, in any of the preceding aspects, another implementation of the aspect provides the estimated degradation parameters are estimated by a trained convolutional neural network (CNN) estimator.

[0020] Optionally, in any of the preceding aspects, another implementation of the aspect provides the estimated degradation parameters are estimated from the input image.

[0021] Optionally, in any of the preceding aspects, another implementation of the aspect provides the image is a subset of pixels from a larger image.

[0022] A third aspect relates to a method of training a perceptual image denoising system. The method includes generating from a clean target image in a first image format a clean perceptually uniform color space (PUCS) target image in a PUCS image format; generating from a noisy input image in the first image format a noisy PUCS input image in the PUCS image format; generating, from the noisy PUCS input image using a denoiser circuit comprising a convolutional neural network (CNN), an estimated PUCS residual image in the PUCS image format; calculating a target PUCS residual image in the PUCS image format from the clean PUCS target image and the noisy PUCS input image; generating a PUCS loss value for residual learning in the PUCS image format, based on the target PUCS residual image and the estimated PUCS residual image; and modifying a convolutional neural network (CNN) of the denoiser circuit based on the PUCS loss value for residual learning.

[0023] Optionally, in any of the preceding aspects, another implementation of the aspect provides the PUCS comprises first and second channels and generating the residual PUCS learning signal includes calculating a first channel difference between the first channel of the target PUCS residual image and the first channel of the estimated PUCS residual image; filtering the first channel difference with a first channel two-dimensional (2D) spatial filter; calculating a second channel difference between the second channel of the target PUCS residual image and the second channel of the estimated PUCS residual image; filtering the second channel difference with a second channel 2D spatial filter; and generating the PUCS loss value for residual learning based on the filtered first channel difference and the filtered second channel difference.

[0024] Optionally, in any of the preceding aspects, another implementation of the aspect provides the method further includes evaluating individual loss values for the filtered first channel difference and the filtered second channel difference; and generating the PUCS loss value for residual learning based on the individually evaluated loss values for the filtered first channel difference and the filtered second channel difference.

[0025] Optionally, in any of the preceding aspects, another implementation of the aspect provides the filtered first channel difference and the filtered second channel difference are individually evaluated for the loss values using one of LI loss, L2 loss, and structural similarity index measure (SSIM).

[0026] A fourth aspect relates to a blind image denoising system with strength control. The system includes a strength circuit configured to apply a transform function to a received input image to generate a value-scaled image; a blind denoiser circuit, configured to receive the value- scaled image and generate a value-scaled denoised image, the value-scaled denoised image having less image noise than the value-scaled image; and an inverse strength circuit configured to apply an inverse transform function to the value-scaled denoised image to generate a value-descaled output image.

[0027] Optionally, in any of the preceding aspects, another implementation of the aspect further includes a strength control circuit configured to provide one or more strength control parameters to the strength circuit and the inverse strength circuit.

[0028] Optionally, in any of the preceding aspects, another implementation of the aspect provides the one or more strength control parameters comprise transform curves and/or inputoutput mixing factors.

[0029] For the purpose of clarity, any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.

[0030] These and other features, and the advantages thereof, will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

[0032] FIG. 1 is a block diagram of a perceptual image denoising system according to an embodiment of the present disclosure. [0033] FIG 2 is a block diagram of perceptual color space conversion systems according to an embodiment of the present disclosure.

[0034] FIG. 3 is a block diagram of a 3 x3 matrix multiplier for color space conversion according to an embodiment of the present disclosure.

[0035] FIG. 4 is a block diagram of a nonlinear mapping module for inverse electro-optical transfer function (EOTF⁴) and electro-optical transfer function (EOTF) for each color space components according to an embodiment of the present disclosure.

[0036] FIG. 5 is a block diagram of a convolutional neural network (CNN) suitable for use in an embodiment of the present disclosure.

[0037] FIG. 6 is a block diagram of a training system for a perceptual image denoising system according to an embodiment of the present disclosure.

[0038] FIG. 7 is a block diagram of a perceptually uniform color space (PUCS) loss function calculator according to an embodiment of the present disclosure.

[0039] FIG. 8 is a block diagram of a two-dimensional (2D) spatial fdter according to an embodiment of the present disclosure.

[0040] FIG. 9 is a block diagram of a training system for an impulse noise detector according to an embodiment of the present disclosure.

[0041] FIG. 10 is a block diagram of an impulse noise detector according to an embodiment of the present disclosure.

[0042] FIG. 1 1 is a block diagram of a training system for a blur kernel estimator according to an embodiment of the present disclosure.

[0043] FIG. 12 is a block diagram of a blur kernel estimator according to an embodiment of the present disclosure. [0044] FIG 13 is a block diagram of an iterative extended image restoration system according to an embodiment of the present disclosure.

[0045] FIG. 14 is a block diagram of a blind image denoising system with strength control according to an embodiment of the present disclosure.

[0046] FIG. 15 is a flow chart of a method for training a perceptual image denoising system according to an embodiment of the present disclosure.

[0047] FIG. 16 is a flow chart of a method of perceptual image denoising according to an embodiment of the present disclosure.

[0048] FIG. 17 is a flow chart of a method of blind image denoising with strength control according to an embodiment of the present disclosure.

[0049] FIG. 18 is a diagram illustrating an image denoising and restoration element according to an embodiment of the present disclosure.

[0050] FIG. 19 illustrates an apparatus configured to implement one or more of the methods for perceptual image denoising and restoration as described herein.

[0051] FIG. 20 shows a contrast sensitivity function (CSF) for human vision.

[0052] FIG. 21 shows CSFs for achromatic I (luminance), red-green P (Protan), and yellowviolet (or yellow-blue) T (Tritan) channels of human vision.

DETAILED DESCRIPTION

[0053] It should be understood at the outset that, although illustrative implementations of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

[0054] Video enabled products (such as cameras, smartphones, drones, robots, and augmented reality (AR) and virtual reality (VR) devices) or network elements (such as network servers) may perform image denoising, deblurring, and restoration to improve image quality. Blind image denoising networks with supervised or self-supervised training may be used without explicit noise level estimation, however their performance may depend on an accuracy of their assumed noise models, which may provide poorer performance when applied to realistic image noises. Such image denoising, deblurring, and restoration systems may be difficult to adapt to different noise levels without retraining or explicit noise level estimation. They may not be adaptable to images of different sizes or resolution, for application to displays of different dimensions, resolution, and luminance, or to changes in ambient lighting or a user’s viewing distance, without degrading their perceptual performance. Such image denoising, deblurring, and restoration systems may be adjusted for varying strength of its function either by subjective tuning or objective optimization. They may not be extendable to stochastically solve other image restoration problems without retraining.

[0055] Systems according to the disclosure perform perceptually optimized training of blind image denoising networks (or perceptual denoisers) which can be extended to image restoration networks. They provide image denoising and extended image restoration networks that are based on machine learning algorithms. Without requiring retraining, systems according to the disclosure may be generalized to different image noise levels, adapted to task-specific use cases, and adjusted for image denoising strength, while still providing perceptually optimized performance. Systems according to the disclosure may be efficiently implemented in hardware, firmware, and/or software in devices for image or video capturing and processing.

[0056] Systems according to the disclosure may include:

• a blind Gaussian denoiser using bias-free CNN architecture;

• self-supervised training with perceptual loss function, to reduce a total perceptual loss, e g , LI loss, L2 loss, or structural similarity index measure (SSTM), in one or a mixture of color spaces, including the effects of contrast sensitivity functions;

• an image restoration module, extended by the Gaussian denoiser, that uses iterative rules adaptively adjusted for each image patch;

• a strength adjustment module for adjusting strength of the blind Gaussian denoiser using a value-to-value scalar transform at the denoiser’ s input and an inverse transform at its output, which adjusts the denoiser’ s strength by modifying transform curves and inputoutput mixing factors;

• a use case profile manager that stores parameter values for forward/inverse perceptual color space conversions and perceptual loss calculations, where sets of use case parameter values may be indexed by some or all of display dimensions, display resolution, display luminance, and/or viewing distance; and

• input and output domain conversion modules to convert pixel domain input representations to angle domain representations for denoising and to convert angle domain represented denoised images to pixel domain representation for output.

[0057] Systems according to the disclosure solve technical problems of improving image quality in denoised and restored images by: • perceptual training of blind denoising networks with loss functions that are defined according to image quality as perceived by the human vision system (HVS) and that are consistent with subjective tests for use case viewing conditions;

• being adaptable without retraining to application-specific use cases having diverse image sizes and/or resolutions, diverse display dimensions, resolutions and/or luminance levels, and diverse viewing distances and/or ambient lighting levels; and

• providing image denoising strength adjustment without retraining that may be subjectively tuned by users or may be objectively optimized by maximizing image quality (IQ) metrics for photographic applications, or computer vision (CV) metrics for CV applications.

[0058] By perceptually training blind denoising networks with loss functions defined according to image quality as perceived by the HVS and consistent with subjective tests for use case viewing conditions, systems according to the disclosure improve perceptual quality. By controlling blind denoising networks using input and output nonlinear mixing modules, systems according to the disclosure enable denoising strength to be subjectively tuned by user adjustment or objectively optimized by image quality metrics for human viewers or for accuracy in machine vision applications. By estimating a degradation matrix for each image patch using missing pixel locations, systems according to the disclosure enable removal of both realistic and synthetic noises, whether the noises are spatially variant or correlated, and whether they are mixed with other types of noises at varying levels. By estimating a degradation matrix for each image patch using blur kernel estimation, systems according to the disclosure enable removal of both realistic and synthetic blurs, whether the blurs are uniform or spatially variant and whether they are caused by optical defocus, dynamic objects, or camera motion. By extending supervised or self-supervised trained blind Gaussian denoisers into iterative image restoration networks, systems according to the disclosure achieve perceptually optimized, adaptive, and adjustable image denoising and restoration, without retraining.

[0059] FIG. 1 is a block diagram of a perceptual image denoising system 100 according to an embodiment of the present disclosure. The denoising system 100 receives a corrupted image 114a and generates a perceptually denoised image 114b. The corrupted image 114a may be received from a sensor in a video enabled product, retrieved from memory, or received via a communication link, such as from an external device or network. The perceptually denoised image 114b may be sent to a display, stored in memory, or sent via a communication link, including to an external device and/or over a network.

[0060] The denoising system 100 includes a strength scalar transform module (or strength circuit) 110a, a perceptual color space conversion (PC SC) module (or circuit) 104a, a pixel-to- angle domain conversion module 106a, a trained blind Gaussian denoiser 102, an angle-to-pixel domain conversion module 106b, an inverse PCSC module 104b, and an inverse strength scalar transform module 110b.

[0061] The strength scalar transform module 110a receives the corrupted image 114a and applies a value-to-value scalar transform to generate a scaled (or value-scaled) corrupted image (SCI). The PCSC module 104a converts the SCI from a standard color space (SCS) image format to a perceptually uniform color space (PUCS) image format to generate an SCI in the PUCS. The pixel-to-angle domain conversion module 106a converts the SCI in the PUCS from a pixel-pitch domain to an angular- frequency domain to generate an angular domain SCI in the PUCS The trained blind Gaussian denoiser 102 denoises the angular domain SCI in the PUCS to generate an angular domain denoised image in the PUCS. The angle-to-pixel domain conversion module 106b generates a pixel domain denoised image in the PUCS from the angular domain denoised image in the PUCS. The inverse PCSC module 104b converts the pixel domain denoised image in the PUCS from the PUCS to the SCS to generate a pixel domain denoised image in the SCS. The inverse strength scalar transform module 110b receives the pixel domain denoised image in the SCS and applies an inverse value-to-value scalar transform to generate the descaled (or value- descaled) perceptually denoised image 114b. The image 114b is referred to as a perceptually denoised image because denoising was performed in a perceptually uniform color space, rather than in a standard color space.

[0062] The PCSC module 104a, the pixel-to-angle domain conversion module 106a, the angle- to-pixel domain conversion module 106b, and the inverse PCSC module 104b are controlled by a production use case profde manager (or circuit) 108. The production use case profde manager 108 stores, for the PCSC module 104a and the inverse PCSC module 104b, parameter values for forward and inverse perceptual color space conversions. The production use case profde manager 108 also stores parameter values for the pixel-to-angle domain conversion module 106a and the angle-to-pixel domain conversion module 106b, the parameter values relating to individual use cases according to display dimensions, resolution, and luminance, as well as viewing distance and ambient lighting.

[0063] The strength scalar transform module 110a and the inverse strength scalar transform module 110b are controlled by a strength control module (or circuit) 112. The strength control module 112 controls transform curves (which control the spatial support of the denoising fdtering) and input-output mixing factors (which control the amplitude extent of the denoising).

[0064] The perceptual image denoising system 100 includes at least the PCSC module 104a, the inverse PCSC module 104b, and the trained blind Gaussian denoiser 102. The production use case profde manager 108, providing parameter values for forward and inverse perceptual color space conversion for the PCSC module 104a and the inverse PCSC module 104b, improves the performance of the perceptual image denoising system 100 without retraining or parameter changes of the trained blind Gaussian denoiser 102. The pixel-to-angle domain conversion module 106a and the angle-to-pixel domain conversion module 106b also improve the performance of the perceptual image denoising system 100 adaptable to application-specific use cases without retraining or parameter changes of the trained blind Gaussian denoiser 102. The production use case profile manager 108, providing parameter values to the pixel-to-angle domain conversion module 106a and the angle-to-pixel domain conversion module 106b, facilitates the performance improvement provided by those two modules. The strength scalar transform module 110a and the inverse strength scalar transform module 110b, under control of the strength control module 112, further tune or optimize the performance of the perceptual image denoising system 100 without retraining or parameter changes of the trained blind Gaussian denoiser 102.

[0065] FIG. 1 illustrates several embodiments of a perceptual image denoising system according to the disclosure. A first embodiment includes the PCSC module 104a, the trained blind Gaussian denoiser 102, and the inverse PCSC module 104b. A second embodiment adds the production use case profile manager 108 to the first embodiment. A third embodiment includes the PCSC module 104a, the pixel-to-angle domain conversion module 106a, the trained blind Gaussian denoiser 102, the angle-to-pixel domain conversion module 106b, and the inverse PCSC module 104b. A fourth embodiment adds the production use case profile manager 108 to the third embodiment. A fifth embodiment adds the strength scalar transform module 1 10a, the inverse strength scalar transform module 110b, and the strength control module 112 to any of the first four embodiments.

[0066] FIG. 2 is a block diagram of perceptual color space conversion systems 200 according to an embodiment of the present disclosure. As discussed with reference to FIG. 1, the PCSC module 104a, and the inverse PCSC module 104b are controlled by the production use case profile manager 108. The PCSC module 104a receives input pixel values in an SCS (or a camera color space) image format and generates output pixels values in a PUCS image format. The SCS input pixel values are converted to the linear XYZ color space (as defined by the International Commission on Illumination (CIE)). The linear XYZ pixel values are converted by a 3x3 matrix (shown in FIG. 3) into linear long, medium, and short (LMS) pixel values in an LMS color space. The linear LMS pixel values are mapped by a nonlinear inverse Electro-Optical Transfer Function (EOTF^-1) (shown in FIG. 4) into nonlinear L’, M’, and S’ pixel values, The nonlinear L’M’S’ pixel values are converted by another 3x3 matrix into individual luminance (Intensity), red-green (Protan or Cp), and yellow-blue (Tritan or C_T) (IPT) pixel values. The IPT pixel values form the output pixels values in the PUCS.

[0067] The inverse PCSC module 104b receives input pixels in the PUCS and generates output pixels in the SCS by a similar series of conversions. The IPT input pixels are converted by a 3x3 matrix into nonlinear L’M’S’ pixel values, which are mapped by a nonlinear EOTF into linear LMS pixel values, which are converted by another 3x3 matrix into linear XYZ pixel values, which are converted into the SCS output pixel values in the SCS (or the camera color space) image format.

[0068] FIG. 3 is a block diagram of a 3><3 matrix multiplier 300 for color space conversion according to an embodiment of the present disclosure. W is a width of the image in pixels and H is a height of the image in pixels. The nine weights used in the 1x1 convolution are different for XYZ-to-LMS, L’M’S ’-to-IPT, IPT-to-L’M’S’, and LMS-to-XYZ conversions.

[0069] FIG. 4 is a block diagram of a nonlinear mapping module (or circuit) 400 for inverse Electro-Optical Transfer Function (EOTF^-1) and electro-optical transfer function (EOTF) for each color space component according to an embodiment of the present disclosure. W and H are, again, width and height of the image in pixels. K is a number of intermediate channels generated from each of the W^XH><3 input pixel values and there are K weights and biases associated with each of the K channels applied in the first 1x1 convolution. K is also the number of channels in the ReLU or Sigmoid layer. There are W^XH^X3 pixel values at the input of the first 1x1 convolution and for each input value there are K output values associated with the K channels with K weights and K biases. Therefore, out of its 3><3xK weights, 3* lxK of them are set to one of the K assigned weight values and the other 3x2xK weights are set to 0, while all its 3xK biases are set to one of the K assigned bias values. At the output of the first 1x1 convolution, there are W^xHx3xK weighted and biased values as the inputs of the ReLU or Sigmoid layer, which generates W^xHx3xK intermediate values at its output as the input of the second 1x1 convolution. For each of the W^xHx3 output value of the second 1x1 convolution, there is a group of K corresponding input intermediate values associated with the K channels. The second 1x1 convolution simply sums up all groups of K input intermediate values associated with the K channels for each of its W^xHx3 output values. Therefore, out of its 3xKx3 weights, 3xKxl of them are set to 1 and the other 3^xKx2 weights are set to 0, while all its 3 biases are set to 0. The electro-optical transfer function (EOTF) is the transfer function having a nonlinear picture or video signal, such as L’M’S’, as input and converting it into the linear light output signal, such as LMS, for subsequent processing or for a display. The inverse Electro-Optical Transfer Function (EOTF^-1), having a linear light input signal, such as LMS as input, and converting it into the nonlinear picture or video output signal, such as L’M’S’ for subsequent processing, is an inverse transfer function of the EOTF. According to an embodiment of the present disclosure, the nonlinear mapping module 400 can closely match the EOTF'¹ and EOTF transfer functions using piecewise linear approximation to achieve arbitrary accuracy, where a nonlinear function is approximated by a series of linear segments that follow the local slope of the function. The number of channels K are determined by the number of linear segments for the approximation, subject to accuracy requirements and resource limitations. The values of biases are determined by the locations of consecutive break points for each linear segment of the approximation. The values of weights are determined by the slopes of each linear segment of the approximation. Different sets of parameter values for weights, biases, and number of channels are applied in the nonlinear mapping module 400 in order to realize the EOTF'¹ and EOTF transfer functions, respectively.

[0070] FIG. 5 is a block diagram of a convolutional neural network (CNN) 500 suitable for use in an embodiment of the present disclosure. The CNN 500 becomes an image denoising convolutional neural network (DnCNN) after training in the training system according to the disclosure of FIG. 6, discussed further below. In other embodiments, the CNN 500 may be a Recurrent CNN, UNet, DenseNet, or other networks having CNN or bias-free CNN architecture (where all biases are removed or set to zero). The CNN 500 comprises an input convolution (Conv) and Rectified Linear Unit (ReLU) (Conv + ReLU) layer, a plurality of intermediate Conv, batch normalization (BN), and ReLU (Conv + BN + ReLU) layers, and an output Conv layer.

[0071] FIG. 6 is a block diagram of a training system 600 for a perceptual image denoising system according to an embodiment of the present disclosure. As discussed in greater detail below, a pair of SCS training images (a clean SCS target image and a noisy SCS input image) are input to the training system 600 and are converted into a clean PUCS target image and a noisy PUCS input image by two PCSC modules 104a. A denoiser-in-training 604 generates an estimated PUCS residual image 606, which is processed through a perceptual loss function calculator 608 to train the denoiser-in-training 604. If training is determined complete, model parameters of the denoiser- in-training 604 are output from the training system 600 to be stored for use as model parameters of the trained blind Gaussian denoiser 102 in a production system.

[0072] Pairs of training images may be obtained or generated in several ways. One method is to capture two images of a scene simultaneously, a clean image captured by a reference imaging device (e.g., a camera or other imaging device) and a noisy image captured by a type of imaging device the denoiser is being trained for. Another method is to train the denoiser for noise of certain types and characteristics by corrupting clean images with controlled noise having the desired characteristics to generate associated noisy images.

[0073] A training use case profile manager 602 loads parameter values (for forward perceptual color space conversions) into first and second PCSC modules 104a and parameter values for inverse color space conversions into an inverse PCSC module 616. The training use case profile manager 602 also loads parameters for 2D spatial filters and loss norm definition into a perceptual loss function calculator 608.

[0074] The first PCSC module 104a receives a clean SCS target image in an SCS image format and converts it to a clean PUCS target image in a PUCS image format. The second PCSC module 104a receives a noisy SCS input image in the SCS image format and converts it to a noisy PUCS input image in the PUCS image format. The noisy PUCS input image is input to the perceptual denoiser-in-training 604, which generates an estimated PUCS residual image 606. The perceptual denoiser-in-training 604 is a blind Gaussian denoising network, which can suitably be embedded inside the trained blind Gaussian denoiser 102 of the perceptual image denoising system 100, having a structure such as the CNN 500.

[0075] Training of the perceptual denoiser-in-training 604 occurs through the operation of the perceptual loss function calculator 608 (discussed in more detail with reference to FIG. 7). The clean PUCS target image is subtracted from the noisy PUCS input image in node 610 to generate a target PUCS residual image 612, which is a first input to the perceptual loss function calculator 608. The estimated PUCS residual image 606 is a second input to the perceptual loss function calculator 608. The perceptual loss function calculator 608 generates a PUCS loss value for residual learning 614, which is applied to update the model parameters of the perceptual denoiser- in-training 604 using a neural network training algorithm to improve its denoising performance. [0076] The estimated PUCS residual image 606 is subtracted from the noisy PUCS input image in node 608 to generate a denoised PUCS image. The denoised PUCS image is received by an inverse PCSC module 616, which converts it to a perceptually denoised SCS image. When training of the perceptual denoiser-in-training 604 is complete, model parameters of the trained perceptual denoiser may be provided at an output of the training system 600, and may be configured for storage as model parameters of the trained blind Gaussian denoiser 102 in a production system, such as the perceptual image denoising system 100.

[0077] FIG. 7 is a block diagram of the perceptual loss function calculator 608 according to an embodiment of the present disclosure. Because the perceptual loss function calculator 608 operates in the PUCS, it functions as a data augmentation technique for training the perceptual denoiser-in-training 604 to reduce a total perceptual loss (e.g., LI loss, L2 loss, or SSIM index) in the PUCS. In some embodiments, the perceptual loss function calculator 608 is adaptable to individual use cases by receiving from the training use case profile manager 602 the parameter values relating to perceptual loss calculation in individual use cases according to display dimensions, resolution, and luminance, as well as viewing distance and ambient lighting.

[0078] The target PUCS residual image 612 and the estimated PUCS residual image 606 are input at first and second inputs, respectively, of the perceptual loss function calculator 608. In this embodiment, the PUCS images comprise pixel values in the TPT color space, with Intensity, Cp, and CT components for each pixel. The I, Cp, and CT components may be referred to as PUCS channels. Difference values 702 are calculated from the I, Cp, and CT components in each image. The I, Cp, and CT differences are individually fdtered by 2D spatial fdters 704 (shown in FIG. 8) and the results are individually evaluated for loss values by a norm unit 706 using one of LI loss, L2 loss, or SSIM index, or a combination thereof. The individually evaluated I, Cp, and CT loss values are then combined by a mixer unit 708 to produce the PUCS loss value for residual learning 614 at the output of the perceptual loss function calculator 608.

[0079] The 2D spatial fdters 704, the norm unit 706, and the mixer unit 708 may receive parameter values from a training use case profde manager 602. Such parameters may be tailored to certain use cases and viewing conditions (e.g., smart phones, desktop computers, or big-screen TVs) by including typical values of display dimensions, viewing conditions, and/or viewing distances. Such application- specific perceptual training for image denoising and restoration systems may improve perceived image quality by, for example, reducing visual artifacts such as noisy, grainy, jagged, and/or blurry images for specific use cases.

[0080] FIG. 8 is a block diagram of a two-dimensional (2D) spatial filter 704 according to an embodiment of the present disclosure. W and H are the width and height of the image in pixels and N^XN is the size of the 2D convolution kernel applied to the pixels of the image. For each output pixel value, each of the three color space components is a 2D convolution of the corresponding input color space component in the N*N pixel neighborhood and the N*N convolution kernel. There is no cross-component convolution between the input pixel values and the output pixel values, so there are a total of N^XN><3 parameter values for weights for the 2D convolution kernel for each of the three color space components. [0081] FIG. 9 is a block diagram of a training system 900 for an impulse noise detector according to an embodiment of the present disclosure. Random valued impulse noise (or RVIN, which replaces actual pixel values with random values), salt-and-pepper impulse noise (or SPIN, which replaces actual pixel values with a random value of zero or one), and pixel dropping (which replaces some pixels with a fixed value of zero) may be referred to as impulse noise.

[0082] The training system 900 receives as inputs a clean input patch (or image portion of a predetermined size) and a groundtruth impulse map. The groundtruth impulse map is a patch of simulated or captured noise of the same size as the clean input patch, and may include one or more of the impulse noise types discussed above. A pixel dropping module 902 combines the clean input patch and the groundtruth impulse map to generate a simulated noisy patch that is input to an impulse detector in training 904. As discussed in further detail with reference to FIG. 10, the impulse detector in training 904 may be a CNN dense prediction network that generates an estimated impulse map from the simulated noisy patch. A loss function calculator 906 receives the groundtruth impulse map and the estimated impulse map, compares the two, and generates a supervised training signal that is applied to update the model parameters of the impulse detector in training 904 using a training algorithm or an optimization algorithm to improve its impulse noise estimation performance.

[0083] The training system 900 may be operated iteratively on the same or on varied groundtruth impulse maps and clean input patches until a measure of similarity between the groundtruth impulse map and the estimated impulse map reaches a predetermined value. At this point, the impulse detector in training 904 may be considered adequately trained for use in a production image restoration system according to the disclosure. [0084] FIG 10 is a block diagram of an impulse noise detector 1000 according to an embodiment of the present disclosure. The impulse noise detector 1000 is suitable for use as the impulse detector in training 904. As discussed above, the impulse noise detector 1000 is a CNN dense prediction network. In other embodiments, the impulse noise detector 1000 may use a support vector machine, a fully connected neural network, or other classical or deep learning-based methods. The impulse noise detector 1000 comprises an input Conv and ReLU (Conv + ReLU) layer, a plurality of intermediate Conv, BN, and ReLU (Conv + BN + ReLU) layers, and an output Conv layer followed by a Sigmoid layer.

[0085] FIG. 11 is a block diagram of a training system 1100 for a blur kernel estimator according to an embodiment of the present disclosure. The training system 1100 receives as inputs a sharp input patch and a groundtruth blur kernel label. The groundtruth blur kernel label may be an indicator directly associated with one member from a finite number of blur kernel candidates, where each blur kernel candidate may be either specified by the characteristics of the corresponding blur kernel (with parameters such as blur radius, motion length, and motion orientation), or captured by actual measurements in a lab or in the field. The blur kernel candidates to be labelled may be caused by defocus blur due to lens settings, motion blur due to dynamic objects or camera motion, or a combination thereof. An image blurring module 1102 convolves the sharp input patch with a synthetic blur kernel specified by the groundtruth blur kernel label to generate a simulated blurry patch that is input to a blur kernel estimator in training 1104. As discussed in further detail with reference to FTG. 12, the blur kernel estimator in training 1 104 may be a CNN classification network that generates an estimated blur kernel candidate from the simulated blurry patch. A loss function calculator 1106 receives the groundtruth blur kernel label and the estimated blur kernel candidate, compares the two, and generates a supervised training signal that is applied to update the model parameters of the blur kernel estimator in training 1104 using a training algorithm or an optimization algorithm to improve its blur kernel candidate estimation performance.

[0086] The training system 1100 may be operated iteratively on the same or on varied groundtruth blur kernel labels and sharp input patches until a measure of similarity between the groundtruth blur kernel label and the estimated blur kernel candidate reaches a predetermined value. At this point, the blur kernel estimator in training 1104 may be considered adequately trained to be used in a production image restoration system according to the disclosure.

[0087] FIG. 12 is a block diagram of a blur kernel estimator 1200 according to an embodiment of the present disclosure. The blur kernel estimator 1200 is suitable for use as the blur kernel estimator in training 1104. As discussed above, the blur kernel estimator 1200 is a CNN classification network. In other embodiments, the blur kernel estimator 1200 may use a support vector machine classifier, a regression neural network, or other classical or deep learning-based methods. The blur kernel estimator 1200 comprises a plurality of Conv, ReLU, and maximum pooling (Max-Pool) (Conv + ReLU + Max-Pool) layers, and an output Conv + ReLU layer followed by a fully connected flattening layer followed by a Soft-Max activation function layer.

[0088] FIG. 13 is a block diagram of an iterative extended image restoration system 1300 according to an embodiment of the present disclosure. The system 1300 includes a degradation matrix module 1302 having a first input that receives an input image or patch (a portion or subset of pixels from a larger image) that may be corrupted in various ways or need other transformation. The degradation matrix module 1302 also has a second input which receives degradation parameters (estimated or assigned, as discussed below) for a degradation matrix that characterizes the image degradation, and is applied to the input image or patch to generate a characterized input image or patch for denoising.

[0089] In some applications, the first input may need demosaicing reconstruction or superresolution enhancement, which are deterministic corruptions (e.g., deterministic pixel missing) and may be characterized with precalculated assigned degradation parameters at the second input to the degradation matrix module 1302. In other applications, the first input may be corrupted by impulse noise and/or random pixel dropping, which may be characterized by passing the first input through the trained CNN impulse noise detector 1000 (discussed with reference to FIGS. 9 and 10) to produce an estimated impulse map for use as the estimated degradation parameters for the second input to the degradation matrix module 1302.

[0090] In other applications, the first input may be corrupted by image blurring, which may be characterized by passing the first input through the trained CNN blur kernel estimator 1200 (discussed with reference to FIGS. 11 and 12) to produce an estimated blur kernel candidate, which may be converted for use as the estimated degradation parameters for the second input to the degradation matrix module 1302.

[0091] The iterative extended image restoration system 1300 further includes an iterative image processing system 1304, which includes a termination evaluation module 1306 and the perceptual denoising system 100 with inputs from the production use case profile manager 108 and the strength control module 112 (as discussed with reference to FIGS. 1-5). The iterative image processing system 1304 also receives the estimated or assigned degradation parameters for the degradation matrix that characterizes the image degradation. The iterative image processing system 1304 initially receives the characterized image or patch as an input and generates an initial intermediate input to the perceptual denoising system 100, which generates a perceptually denoised image. The termination evaluation module 1306 receives the perceptually denoised image from the perceptual denoising system 100 and evaluates it for a termination criterion. If the image does not meet the termination criterion, the termination evaluation module 1306 enables the iterative image processing system 1304 to generate a subsequent intermediate input to the perceptual denoising system 100 for another iteration of denoising. If the perceptually denoised image meets the termination criterion, the termination evaluation module 1306 enables the iterative image processing system 1304 to output a perceptually denoised and restored image.

[0092] In some embodiments, the termination criterion comprises determining a value representing a difference between a current perceptually denoised image and a perceptually denoised image generated in a previous iteration of denoising. If the value is less than a predetermined threshold amount, the perceptually denoised image meets the termination criterion. If the value is greater than the predetermined threshold amount, the perceptually denoised image does not meet the termination criterion.

[0093] FIG. 14 is a block diagram of a blind image denoising system 1400 with strength control according to an embodiment of the present disclosure. The blind image denoising system 1400 receives a corrupted input image 1402. The corrupted input image 1402 is input to a strength scalar transform module (T) 1404, which applies a value-to-value scalar transform to generate a scaled corrupted input image 1406. The scaled corrupted input image 1406 is input to an end-to- end blind image denoiser 1408, which generates a scaled denoised image from the scaled corrupted input image 1406. The scaled denoised image is received at a first input of a mixing module 1410. The scaled corrupted input image 1406 is received at a second input of the mixing module 1410. The two input images are mixed to generate a scaled adjusted denoised image, which is input to an inverse strength scalar transform module (T'¹) 1412. The inverse strength scalar transform module 1412 applies an inverse value-to-value scalar transform to the scaled adjusted denoised image to generate an descaled adjusted denoised image. The descaled adjusted denoised image is received at a first input of a mixing module 1414. The original corrupted input image 1402 is received at a second input of the mixing module 1414. The two input images are mixed to generate a descaled denoised output image 1416.

[0094] The strength scalar transform module 1404, the mixing module 1410, inverse strength scalar transform module 1412, and the mixing module 1414 are controlled by a strength control module 1418. The strength control module 1418 controls transform curves (which control spatial support of denoising filtering) and input-output mixing factors (which control the amplitude extent of the denoising). The inserted value-to-value scalar transforms 1404 and 1412 are used to stabilize or diversify noise variances seen by the end-to-end blind image denoiser 1408, which will automatically adjust its spatial support of denoising filtering for strength adjustment. The inputoutput mixing performed in the scaled image domain by mixing module 1410 and the descaled image domain by mixing module 1414 will adjust the amplitude extent of the denoising. The two mixing modules 1410 and 1414 can adjust the strength of denoising in different ways, due to the nonlinear nature of the forward and inverse scalar transforms performed by the modules 1404 and 1412, so including both of them can provide the users with more degree of freedom for denoising strength fine-tuning or optimization.

[0095] FIG. 15 is a flow chart for a method 1500 of training a perceptual image denoising system according to an embodiment of the present disclosure. For ease in understanding, the method 1500 is described as though performed using the elements of the training system 600, although the method is not limited solely to the architecture of the training system 600. [0096] Tn step 1502, parameter values for an EOTF ¹ function and XYZ-to-LMS and L’M’S’- to-IPT matrices are received from the training use case profde manager 602 and used in configuring the first and second PCSC modules 104a. In step 1504, parameter values for IPT 2D spatial filters and loss mixing factors among LI loss, L2 loss, or SSIM index are received from the training use case profile manager 602 and used in configuring the perceptual loss function calculator 608. In step 1506, a clean SCS target image is received and converted, using the first PCSC module 104a, into a clean PUCS target image. In step 1508, a noisy SCS input image is received and converted, using the second PCSC module 104a, into a noisy PUCS input image.

[0097] In step 1510, the perceptual denoiser-in-training 604 generates an estimated PUCS residual image from the noisy PUCS input image and the input to the perceptual loss function calculator 608. In step 1512, the perceptual denoiser-in-training 604 is trained using a neural network training algorithm to improve its denoising performance by updating denoiser model parameters according to a PUCS loss value for residual learning. The PUCS loss value for residual learning is generated by the perceptual loss function calculator 608. When training is complete, in step 1514, the training system 600 provides model parameters of the now-trained perceptual denoiser 604, configured for storage as model parameters of the trained blind Gaussian denoiser 102 in a production system.

[0098] FIG. 16 is a flow chart for a method 1600 of perceptual image denoising according to an embodiment of the present disclosure. For ease in understanding, the method 1600 is described as though performed using the elements of the perceptual image denoising system 100, although the method is not limited solely to the architecture of the perceptual image denoising system 100.

[0099] In step 1602, transform curves and input-output mixing factors are received from the strength control module 112 and used in configuring the strength scalar transform module 110a and the inverse strength scalar transform module 1 1 Ob. Tn step 1604, parameter values for forward and inverse perceptual color space conversions are received from the production use case profde manager 108 and used in configuring the PCSC module 104a and the inverse PCSC module 104b. Also received in step 1604 from the production use case profile manager 108 are parameter values relating to individual use cases according to display dimensions, resolution, and luminance, as well as viewing distance and ambient lighting, which are used in configuring the pixel-to-angle domain conversion module 106a and the angle-to-pixel domain conversion module 106b.

[00100] In step 1606, a corrupted image is received and scaled by the strength scalar transform module 110a using a value-to-value scalar transform to generate a scaled corrupted image. In step 1608, the scaled corrupted image is converted from a standard color space to a PUCS by the perceptual color space conversion module 104a. In step 1610, the scaled corrupted image in the PUCS is converted from an image in a pixel-pitch domain to an image in an angular-frequency domain by the pixel-to-angle domain conversion module 106a. In step 1612, the scaled corrupted PUCS image in the angular-frequency domain is denoised by the trained blind Gaussian denoiser 102.

[00101] In step 1614, the denoised scaled PUCS image in the angular-frequency domain is converted back to an image in the pixel-pitch domain by the angle-to-pixel domain conversion module 106b. In step 1616, the denoised scaled PUCS image in the pixel-pitch domain is converted to an image in a standard color space by the inverse PCSC module 104b. In step 1618, the denoised scaled standard color space image is descaled by the inverse strength scalar transform module 110b using an inverse value-to-value scalar transform to generate the descaled perceptually denoised image 114b. [00102] Similar to the discussion above with reference to FIG. 1 , FIG. 16 illustrates several methods of perceptual image denoising according to the disclosure. A first embodiment includes the steps 1608, 1612, and 1616. A second embodiment adds the elements of step 1604 relating to configuring the PCSC module 104a and the inverse PCSC module 104b. Athird embodiment adds steps 1610 and 1614 to either the first or second embodiment. A fourth embodiment adds the elements of step 1604 relating to configuring the pixel-to-angle domain conversion module 106a and the angle-to-pixel domain conversion module 106b to the third embodiment. A fifth embodiment adds steps 1602, 1606, and 1618 to any of the first four embodiments.

[00103] FIG. 17 is a flow chart for a method 1700 of blind image denoising with strength control according to an embodiment of the present disclosure. For ease in understanding, the method 1700 is described as though performed using the blind image denoising system 1400 with strength control, although the method is not limited solely to the architecture of the system 1400.

[00104] In step 1702, transform curves and input-output mixing factors are received from the strength control module 1418 and used in configuring the strength scalar transform module 1404, the mixer 1410, the inverse strength scalar transform module 1412, and the mixer 1414. In step 1704, a corrupted image is received and scaled by the strength scalar transform module 1404 using a value-to-value scalar transform to generate a scaled corrupted image. In step 1706, the scaled corrupted image is denoised by the end-to-end blind image denoiser 1408 to generate a scaled denoised image. In step 1708, the scaled denoised image is mixed with a scaled corrupted image, descaled by an inverse value-to-value scalar transform, and then mixed with a corrupted image to generate the descaled denoised output image 1416, by the mixer 1410, the inverse strength scalar transform module 1412, and the mixer 1414, respectively. [00105] FIG. 18 is a diagram illustrating an image denoising and restoration element 1800 according to an embodiment of the present disclosure. The image denoising and restoration element 1800 can be any image denoising and restoration device such as, but not limited to, a video enabled product or a network server. In some embodiments, the image denoising and restoration element 1800 may also be referred to as a network device. The image denoising and restoration element 1800 includes receiver units (RX) 1820 or receiving means for receiving data via ingress ports 1810. For example, the ingress ports 1810 may connect to one or more cameras or other image capturing or retrieving devices. The image denoising and restoration element 1800 also includes transmitter units (TX) 1840 or transmitting means for transmitting via data egress ports 1850. For example, the egress ports 1850 may connect to one or more displays or other image transmitting or storing devices.

[00106] The image denoising and restoration element 1800 includes a memory 1860 or data storing means for storing the instructions and various data. The memory 1860 can be any type of, or combination of, memory components capable of storing data and/or instructions. For example, the memory 1860 can include volatile and/or non-volatile memory such as read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM). The memory 1860 can also include one or more disks, tape drives, and solid-state drives. In some embodiments, the memory 1860 can be used as an over-flow data storage device to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.

[00107] The image denoising and restoration element 1800 has one or more processor(s) 1830 or other processing means (e.g., central processing unit (CPU) or graphics processing unit (GPU)) to process instructions. The processor 1830 may be implemented as one or more CPU chips, cores (e g., as a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs). The processor 1830 may include hardware accelerators configured to implement the CNNs or other neural networks as described herein.

[00108] The processor 1830 is communicatively coupled via a system bus with the ingress ports 1810, the RX 1820, the TX 1840, the egress ports 1850, and the memory 1860. The processor 1830 can be configured to execute instructions stored in the memory 1860. Thus, the processor 1830 provides a means for performing any computational, comparison, determination, initiation, configuration, or any other action corresponding to the claims when the appropriate instruction is executed by the processor. In some embodiments, the memory 1860 can be memory that is integrated with the processor 1830. The hardware accelerators may be configured with parameters stored in the memory 1860.

[00109] In various embodiments, the memory 1860 stores a perceptual denoising and restoration module 1870. The perceptual denoising and restoration module 1870 includes data and executable instructions for implementing the disclosed perceptual denoising and restoration embodiments. For instance, the perceptual denoising and restoration module 1870 can include instructions for implementing the systems and methods described with reference to FIGS. 1-5, 9- 13, and 16.

[00110] In various embodiments, the memory 1860 stores a perceptual denoiser training module 1880. The perceptual denoiser training module 1880 includes data and executable instructions for implementing the disclosed training system for a perceptual image denoising system. For instance, the perceptual denoiser training module 1880 can include instructions for implementing the systems and methods described with reference to FIGS. 6-8 and 15. [00111] Tn various embodiments, the memory 1860 stores a strength control blind image denoising system 1890. The strength control blind image denoising system 1890 includes data and executable instructions for implementing the disclosed blind denoiser with strength control. For instance, the strength control blind image denoising system 1890 can include instructions for implementing the systems and methods described with reference to FIGS. 14 and 17. The inclusion of the perceptual denoising and restoration module 1870, the perceptual denoiser training module 1880, and/or the strength control blind image denoising system 1890 substantially improves the functionality of the image denoising and restoration element 1800 by enabling methods of perceptual image denoising, restoration, and perceptual image denoiser training.

[00112] FIG. 19 illustrates an apparatus 1900 configured to implement one or more of the systems and methods for perceptual denoising, restoration, and/or perceptual denoiser training as described herein. For example, the apparatus 1900 is configured to implement one or more of the systems and methods described with reference to FIGS. 1-17. The apparatus 1900 may be implemented in the image denoising and restoration element 1800. The apparatus 1900 comprises means 1902 for perceptual image denoising, as described with reference to FIGS. 1-5 and 16, and/or iterative extended image restoration, as described with reference to FIGS. 9-13. The apparatus 1900 may additionally or alternatively comprise means 1904 for perceptual image denoiser training, as described with reference to FIGS. 6-8 and 15. The apparatus 1900 may further additionally or alternatively comprise means 1906 for blind image denoising with strength control, as described with reference to FIGS. 14 and 17.

[00113] The disclosed embodiments may be a system, an apparatus, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure. The computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.

[00114] While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the disclosure is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

[00115] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.

Contrast Sensitivity Function

[00116] In FIG 20, a contrast sensitivity function (CSF) for human vision is shown to vary with retinal illuminance, in units of troland (Td), which is equal to object luminance (in candela per square meter (cd/m²) times pupillary aperture area (in square millimeters (mm²). The graph in FIG. 20 shows a family of curves representing different adaptation states, from very dark (0.0009 Td) to very bright (900 Td). Each CSF curve for a specified retinal illuminance shows a graph of the dependence of contrast sensitivity (in percentage) with respect to angular spatial frequency in units of cycles per degree. The CSF curve at 90 Td, which is representative of viewing electronic displays, peaks at about 4 cycles per degree. The 90 Td CSF curve has fallen to a contrast sensitivity of 1 at about 60 cycles per degree. Human vision could not perceive angular spatial frequencies greater than this value, which is roughly considered as the limit of angular discrimination of normal human vision. This is consistent with the generally accepted human visual acuity which is about 1 arc minute under well-illuminated environments for people with 20/20 vision. Therefore, a display does not need to reproduce visual details finer than this angular spatial frequency, which limits the maximum resolution that needs to be reproduced. The peak of the 90 Td CSF curve has a contrast sensitivity for luminance differences less than 1%. Human vision could not discern luminance differences less than this percentage, which limits the number of gray levels that need to be reproduced by a display.

[00117] In FIG. 21, CSFs for achromatic I (luminance), red-green P (Protan), and yellow- violet (or yellow-blue) T (Tritan) channels of human vision are shown to vary with object illuminance (in cd/m²). The CSF curve at 200 cd/m², which is representative of viewing electronic displays, peaks at about 2 cycles per degree. For 200 cd/m² illuminance level, the luminance CSF is bandpass in nature, with peak sensitivity around median angular spatial frequencies. This function will approach 0 at zero cycles per degree, illustrating the tendency for the visual system to be insensitive to uniform fields. It will also approach 0 at about 60 cycles per degree, the point at which detail can no longer be resolved by the perception of human eyes. The chromatic CSFs are of a low-pass nature and have significantly lower cutoff frequencies, which also indicate the reduced sensitivity of chromatic information for minute details, edges, and textures Resolution and Viewing Distance

[00118] In video terminology, resolution concerns the maximum number of line pairs (or cycles) that can be resolved on the display screen. Resolution in a digital image system is bounded by the number of pixels (or samples) across the image width and height. Non-ideal electronic and optical effects can cause undesirable effects to diminish even within the bounds imposed by sampling. For digital imaging, resolution is related to perceived sharpness and can be expressed in terms of image spatial frequency, in units of cycles per picture height vertically and cycles per picture width horizontally, which are limited by one half of the vertical and horizontal spatial sampling rates, respectively. A general rule to determine the optimum viewing distance of a display is where its pixel pitch subtends an angle of about 1/60°. If displayed images are viewed closer than the optimum distance, the pixel structure is likely to be discernible by the viewers, and the perceived quality of the images will suffer. If displayed images are viewed much farther than the optimum distance, each individual pixel subtends an angle well lower than the limit of angular discrimination of normal human vision. This will cause unnecessary loss of perceived image resolution and decrease of screen viewing angle, which is both wasteful and undesirable.

Viewing Angle and Spatial frequency

[00119] To calculate a vertical and a horizontal viewing angle

and

subtended by a display with height H and width W, respectively, at a viewing distance D, the following equations can be used, assuming the viewer is located right in front of the display screen. (in radians) for smaller angles (in radians) for smaller angles

[00120] Suppose there are

pixels on the display screen for displaying a digital image with the same resolution, then each pixel on the display screen will subtend a vertical and a horizontal viewing angle

and

, respectively, which can be calculated by the following equations, assuming all pixels are perceived by the viewer as uniform in size. The parameters NH and Nw can be called the vertical and horizontal spatial sampling rates, respectively, in samples per picture height and samples per picture width.

[00121] To achieve the optimal viewing distance

where each pixel subtends an angle of 1/60°, a linear relationship with vertical and horizontal pixel pitch H/NH and W/Nw, respectively, can be found from the following equations. In many displays, each pixel is square in shape and the vertical and horizontal pixel pitch will be the same. In such cases the

values calculated from the two equations will also be the same.

[00122] Similarly, with known display dimension and viewing distance, the vertical and horizontal image spatial frequencies

and

(in cycles per picture height and cycles per picture width, respectively) defined for a digital image being displayed can be related to the vertical and horizontal angular spatial frequencies and (in cycles per degree) perceived by a viewer with

the following equations.

[00123] It should be noted that the conversion equations above are not directly related to the image resolution and pixel pitch. However, when designing the 2D spatial filters according to the contrast sensitivity functions, instead of image spatial frequencies and

, the 2D spatial

frequency responses are usually specified in normalized spatial frequencies ω_H and ω_W (in radians per sample) with respect to spatial sampling rates NH and NW, respectively. ω_H = 2π υ_H / N_H = 360 × (H/D/N_H) × μ_H (in radians per pixel)

[00124] Most image denoising and restoration are being applied on full-size images in the pixel domain, with image processing irrespective to image / screen resolution (e.g., NH and NW), display dimensions (e.g., H and W), viewing distance (e.g., D), screen illuminance (in nits), and ambient lighting (lux). However, contrast sensitivity and visual acuity are closely related to human vision perception, and both are defined in terms of angular spatial frequencies measured in cycles per degree. Therefore, the perceived image quality strongly depends on angular width of pixel pitch seen by viewers. By converting between the pixel domain, where denoising networks are trained and operated, and the angular spatial frequency domain, where perceptual loss is accessed, calculation of perceptual loss function can be achieved accordingly. [00125] In order to calculate the perceptual loss function values for the perceptual training of image denoising networks, 2D spatial filters for I, P, and T channels, with 2D spatial frequency responses following their respective CSF curves, are required. To specify the 2D frequency responses of each of the 2D spatial filters, separable filters with the 1D frequency responses shown in the following equations can be used, which are related to the respective CSFs with a suitable frequency scaling. Only equations for the vertical dimension are shown here for simplicity.

where SH = D x NH I H I 360 is the frequency scaling factor, Fk() are the ID spatial frequency responses, CSFk () are the contrast sensitivity functions, and are monotonically non

decreasing mapping function, for one of the I, P, and T channels. Suitable

CSFk () and for the adaptation states of illuminance levels matched to the typical application-specific use cases should be applied for optimal perceived quality of the image denoising and restoration systems. can be linear, affine, or non-linear mappings, and same or different mappings can be applied for the I, P, and T channels.

[00126] With the disclosed method, perceptual loss functions can be defined specific to certain typical use cases and viewing conditions, e.g., smart phones, PCs, big-screen TVs, by assigning typical values of display dimensions / resolution, viewing conditions, and distances. Therefore, application-specific perceptual training for image denoising and restoration methods can substantially optimize perceived image quality while avoiding visual artifacts, e.g., noisy, grainy, jaggy, and blurry images, for various use cases.

Pixel to Angular Spatial Frequency Domain Conversion

[00127] To achieve adaptivity to application- specific use cases, head trackers and lux meters can be used to measure viewing distance and ambient lighting, and then select or blend corresponding parameter profiles according to assigned prevailing use case. According to the disclosed method for converting between the pixel domain and the angular spatial frequency domain by frequency scaling, CSFs selecting, and mapping, perceptual training of the denoising network can be performed for each parameter profile assigned for each defined applicationspecific use case. One method to manage different parameter profiles assigned for different use cases is to perform perceptual training of the denoising network with different frequency scaling, CSFs selecting, and mapping, according to the disclosed method. In this approach, multiple sets of frequency- scaled filter frequency responses are used during multiple training processes, while no input / output image resizing are performed during inference (or production). Each perceptual training process for the denoising network are performed on the original image resolution and size, with the 2D spatial filters with frequency- scaled filter frequency responses corresponding to each parameter profile assigned for each use case, for each of the I, P, and T channels. There is no need to resize input / output image resolution and size during inference (or production), but perceptual training needs to be performed multiple times and multiple sets of trained denoising network parameters need to be stored and retrieved.

[00128] Another method to manage different parameter profiles while saving costly retraining and storage for multiple parameter profiles is to perceptually train denoising networks in an angular spatial frequency domain specified only to a selected use case, while adapt it to diverse use cases during inference (or production). In this approach, only one set of frequency- scaled filter frequency responses are used during one single training process, while multiple sets of parameters for performing input / output image resizing are needed during inference (or production). Only one perceptual training of the denoising network is performed on the original image resolution and size, with the 2D spatial filters with frequency-scaled filter frequency responses corresponding to the parameter profile assigned only for the selected use case, for each of the I, P, and T channels. There is no need to repeat perceptual training multiple times and only one set of trained denoising network parameters need to be stored and retrieved, but resizing of input / output image resolution and size is needed during inference (or production).

[00129] In calculating the perceptual loss function values for the perceptual training of image denoising networks, comparable results can be achieved if spatial scaling (i.e., resampling or resizing for images) for the input / output images is performed instead of frequency scaling for the frequency responses of the 2D spatial fdters for I, P, and T channels.

[00130] According to the similarity theorem of 2D Fourier transform, a linear scaling of the spatial frequencies results in an inverse scaling of the spatial variables with a scaling factor. Namely, stretching of an axis in one domain results in a contraction of the corresponding axis in the other domain plus an amplitude change, and vice versa, i.e.,

where are the spatial domain variables and are the spatial frequency domain

variables. The symbol denotes the correspondence between a 2D Fourier transform pair of g(x, y) in the spatial domain and

in the spatial frequency domain. Here the prototype 2D frequency response

can be defined as a separable 2D frequency response

of the 2D spatial filter for one of the I, P, and T channels for calculating

perceptual loss function values.

[00131] According to the convolution theorem of 2D Fourier transform, the convolution of two functions in one domain corresponds to the product of their counterparts in the other domain. Namely, the spatial convolution (equivalent to 2D filtering) in the spatial domain results in the multiplication in the spatial frequency domain, i.e.,

where i(x, y) is the difference image in the spatial domain as the input for one of the 2D spatial filters for I, P, and T channels, and

is its transform in the spatial frequency domain. By suitably substituting the variables in the spatial domain with and the lefthand side

of equation (2) can be replaced as shown in the following

[00132] Therefore, a frequency scaling of the prototype 2D frequency response results

in an inverse spatial scaling of the corresponding 2D impulse response g(x,y) with an amplitude scaling factor. The 2D filtering by such inverse spatially scaled 2D impulse response

on the original input image i(x, y) is equivalent to the 2D filtering by the prototype 2D impulse response on the spatially scaled input image which can be obtained by

resampling (also known as resizing for images) the original input image i(x, y) with a spatially scaled sampling grid.

[00133] Using the same reasoning, the relationship between 2D filtering by the prototype 2D frequency response with two different frequency scaling factor values can also be derived as shown in the following equations. First, the perceptual training of the denoising network is performed only once using a selected use case with the frequency scaling factors

and

, i.e.,

[00134] By substituting the variables in the spatial domain with and where

S_H and S_w are arbitrary frequency scaling factors, the lefthand side of equation (2) can be replaced as shown in the following:

[00135] By comparing equation (2), (4) and (5) above, it can be seen that performing 2D filtering by the prototype 2D frequency response frequency scaled by arbitrary frequency scaling factors S_H and S_W on the original input image i(x, y) is equivalent to performing 2D filtering by the prototype 2D frequency response frequency scaled by selected frequency scaling factors

Therefore, for a denoising network

perceptually trained for a selected use case corresponding to selected frequency scaling factors

and optimal denoising performance can also be achieved for other use cases if the denoising network is applied on spatially scaled input images during inference (or production).

[00136] In order to maintain the constant size of the noisy input images and denoised output images, spatial scaling (i.e., resampling or resizing for images) should be performed before the denoising network, and corresponding inverse spatial scaling should be performed after the denoising network, on an image-by-image basis. To save the computation complexity, inverse spatial scaling can be omitted if the denoising network is performed on a patch-by-patch basis and each input image patch is resampled from the original input image by a spatially scaled sampling grid aligned with a target output pixel position on the original sampling grid.

[00137] In FIG. 1 described above, adaptivity to multiple use cases for a perceptually trained image denoising network during inference (or production) is shown. The two modules for pixel- to-angle and angle-to-pixel domain conversions (106a and 106b, respectively) are inserted into an image processing pipeline during inference (or production). The pixel-to-angle domain conversion module 106a uses spatial scaling to convert images from the original pixel domain to an angular as shown in

equation (5) for adaptivity to use cases without retraining. The angle-to pixel domain conversion module 106b, if not omitted as described above, uses inverse spatial scaling to convert images from the angular spatial frequency domain back to the original pixel domain for maintaining constant image size between the input and output of the denoiser. [00138] To avoid detail loss from the input images caused by domain conversion, the preferred spatial scaling (i.e., resampling or resizing for images) to be performed on the input images is image enlargement, Therefore, the specified

H W angular spatial frequency domain for perceptually training can be selected for a use case with narrowest passband, i.e., with the largest frequency scaling factors and

, for 2D spatial filters

in its corresponding perceptual loss function calculation. For an image processing pipeline during inference (or production), the perceptual color space conversion module 104a and the pixel-to- angle domain conversion module 106a can be fully integrated and controlled by the production use case profile manager module 108, as shown in FIG. 1. Similarly, the angle-to-pixel domain conversion module 106b (if not omitted) and the inverse perceptual color space conversion module 104b can also be fully integrated and controlled by the production use case profile manager module

108.

Claims

CLAIMS What is claimed is:

1. A perceptual image denoising system, comprising a perceptually uniform color space (PUCS) conversion circuit configured to receive an image in a first image format and generate a PUCS image in a PUCS image format; a perceptual denoiser circuit configured to receive the PUCS image and generate a denoised PUCS image in the PUCS image format, the denoised PUCS image having less image noise than the PUCS image; and an inverse PUCS conversion circuit configured to receive the denoised PUCS image and generate an output image in the first image format.

2. The perceptual image denoising system of claim 1, further comprising: a pixel-to-angle domain conversion circuit configured to receive the PUCS image and generate a PUCS angular-frequency (AF) image in an AF domain, wherein the perceptual denoiser circuit is further configured to generate a denoised PUCS AF image from the PUCS AF image; and an angle-to-pixel domain conversion circuit configured to generate the denoised PUCS image from the denoised PUCS AF image.

3. The perceptual image denoising system of either claim 1 or 2, further comprising a use case profile control circuit configured to provide conversion parameters to one or more of the PUCS conversion circuit, the pixel-to-angle domain conversion circuit, the angle-to-pixel domain conversion circuit, or the inverse PUCS conversion circuit.

4. The perceptual image denoising system of any of claims 1 -3, further comprising: a strength circuit configured to apply a transform function to the image to generate a value- scaled image; the PUCS conversion circuit is further configured to receive the value-scaled image; and an inverse strength circuit configured to apply an inverse transform function to the output image to generate a value-descaled output image.

5. The perceptual image denoising system of any of claims 1-4, further comprising a strength control circuit configured to provide one or more strength control parameters to the strength circuit and the inverse strength circuit.

6. The perceptual image denoising system of any of claims 1-5, wherein the one or more strength control parameters comprise transform curves and/or input-output mixing factors.

7. The perceptual image denoising system of any of claims 1-6, wherein the perceptual denoiser circuit comprises a convolutional neural network (CNN).

8. The perceptual image denoising system of any of claims 1-7, wherein the CNN comprises one of a Recurrent CNN, a UNet, or a DenseNet.

9. The perceptual image denoising system of any of claims 1-8, wherein the CNN uses a bias-free

CNN architecture where all biases are removed or set to zero.

10. The perceptual image denoising system of any of claims 1 -9, wherein the perceptual denoiser circuit comprises a trained blind Gaussian denoiser.

11. The perceptual image denoising system of any of claims 1-10, wherein the perceptual denoiser circuit is configured to perform an iterative image restoration process until a termination criterion is detected.

12. The perceptual image denoising system of any of claims 1-11, wherein the termination criterion is a threshold difference between a current perceptually denoised image and a previous perceptually denoised image.

13. An image restoration system comprising: a degradation matrix module configured to: receive an input image and degradation parameters; and generate a characterized input image; and a perceptual image denoising system according to any of the preceding claims, configured to receive the characterized input image and generate a perceptually denoised image.

14. The image restoration system of claim 13, wherein the degradation parameters are one of estimated degradation parameters or assigned degradation parameters.

15. The image restoration system of claim 13 or 14, wherein the assigned degradation parameters are precalculated parameters for characterizing deterministic corruptions.

16. The image restoration system of any of claims 13-15, wherein the estimated degradation parameters are estimated by a trained convolutional neural network (CNN) estimator.

17. The image restoration system of any of claims 13-16, wherein the estimated degradation parameters are estimated from the input image.

18. The image restoration system of any of claims 13-17, wherein the input image is a subset of pixels from a larger image.

19. A method of training a perceptual image denoising system, the method comprising:

Generating, from a clean target image in a first image format, a clean perceptually uniform color space (PUCS) target image in a PUCS image format; generating, from a noisy input image in the first image format, a noisy PUCS input image in the PUCS image format; generating, from the noisy PUCS input image and using a denoiser circuit comprising a convolutional neural network (CNN), an estimated PUCS residual image in the PUCS image format; calculating a target PUCS residual image in the PUCS image format from the clean PUCS target image and the noisy PUCS input image; generating a PUCS loss value for residual learning in the PUCS image format, based on the target PUCS residual image and the estimated PUCS residual image; and modifying a convolutional neural network (CNN) of the denoiser circuit based on the PUCS loss value for residual learning.

20. The method of claim 19, wherein the PUCS comprises first and second channels and generating the PUCS loss value for residual learning comprises: calculating a first channel difference between the first channel of the target PUCS residual image and the first channel of the estimated PUCS residual image; filtering the first channel difference with a first channel two-dimensional (2D) spatial filter; calculating a second channel difference between the second channel of the target PUCS residual image and the second channel of the estimated PUCS residual image; filtering the second channel difference with a second channel 2D spatial filter; and generating the PUCS loss value for residual learning based on the filtered first channel difference and the filtered second channel difference.

21. The method of claim 19 or 20, further comprising: evaluating individual loss values for the filtered first channel difference and the filtered second channel difference; and generating the PUCS loss value for residual learning based on the individually evaluated loss values for the filtered first channel difference and the filtered second channel difference.

22. The method of any of claims 19-21, wherein the filtered first channel difference and the filtered second channel difference are individually evaluated for the individual loss values using one of a LI loss, a L2 loss, or a structural similarity index measure (SSIM).

23. A blind image denoising system with strength control, comprising: a strength circuit configured to apply a transform function to an image to generate a value- scaled image; a blind denoiser circuit configured to receive the value-scaled image and generate a value- scaled denoised image, the value-scaled denoised image having less image noise than the value- scaled image; and an inverse strength circuit configured to apply an inverse transform function to the value- scaled denoised image to generate a value-descaled output image.

24. The blind image denoising system with strength control of claim 23, further comprising a strength control circuit configured to provide one or more strength control parameters to the strength circuit and the inverse strength circuit.

25. The blind image denoising system with strength control of claim 24, wherein the one or more strength control parameters comprise transform curves and/or input-output mixing factors.