WO2024099564A1 - Learnable point spread functions for image rendering - Google Patents

Learnable point spread functions for image rendering Download PDF

Info

Publication number
WO2024099564A1
WO2024099564A1 PCT/EP2022/081411 EP2022081411W WO2024099564A1 WO 2024099564 A1 WO2024099564 A1 WO 2024099564A1 EP 2022081411 W EP2022081411 W EP 2022081411W WO 2024099564 A1 WO2024099564 A1 WO 2024099564A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
point spread
pixel
spread function
depth
Prior art date
Application number
PCT/EP2022/081411
Other languages
French (fr)
Inventor
Richard Shaw
Eduardo PEREZ PELLITERO
Sibi CATLEY-CHANDAR
Ales LEONARDIS
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2022/081411 priority Critical patent/WO2024099564A1/en
Publication of WO2024099564A1 publication Critical patent/WO2024099564A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20004Adaptive image processing
    • G06T2207/20012Locally adaptive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • This disclosure relates to image processing, in particular to rendering images.
  • Neural Radiance Fields can enable the rendering of high-quality images and depth maps of novel views of a three-dimensional (3D) scene.
  • 2D images of a 3D scene with known camera poses (for example, focal length, rotation and translation)
  • NeRF can learn an implicit mapping from spatial coordinates (x, y, z) to volume density and view-dependent colour (RGB).
  • RGB view-dependent colour
  • a process known as volume rendering accumulates samples of this scene representation along camera rays to render the scene from any viewpoint.
  • FIG 1 schematically illustrates a visualization of a typical NeRF framework.
  • NeRF enables the rendering of high-quality images and depth maps of novel views of a scene.
  • Multi-view input images of a 3D scene, along with their corresponding camera poses, are processed by the NeRF optimization framework, learning a scene representation enabling novel views (RGB and depth) of the scene to be rendered.
  • NeRF can obtain impressive results for the task of novel view synthesis, but relies on a simplified pinhole camera model, which is a theoretical lens-less model where all rays pass through a single point known as the camera’s optical centre, as shown in Figure 2a. Under such lens-less model assumptions, all rays pass through a single point, which results in images that are entirely in focus (i.e. all parts of the images are equally sharp). However, this is unrealistic, since real cameras typically use complex multi-lens systems, with one or more moving groups designed for, for example, different focal lengths, as shown schematically in Figure 2b. This can result in a complex optical response governed by physical properties such as aperture, focus distance, focal length and depth. Furthermore, all real lenses will exhibit some amount of depth-of-field blur (DoF) and will never be completely in focus.
  • DoF depth-of-field blur
  • PSF point spread function
  • IRF impulse response function
  • Figure 4a illustrates an example of real lens PSFs as measured scientifically in the lab using specialist equipment.
  • the PSFs are shown for a number of locations across the image sensor.
  • Figure 4b shows a close up visualization of the PSFs at the corner of the sensor and in the centre for different aperture values: f2.8, f4 and f5.6, as described in M. Bauer et al., “Automatic Estimation of Modular Transfer Functions”, 2018 IEEE International Conference on Computational Photography (ICCP), 1-12 [Bauer et al. 2018], As the size of the aperture is increased, and towards the corner of the image sensor, the size of the PSF blur increases.
  • ICCP International Conference on Computational Photography
  • PSFs typically have a sharp edge, as the lens aperture occludes rays, as schematically illustrated in Figure 5a.
  • Figure 5b illustrates other variations in lens blur, for example caused by vignetting of the lens barrel and imperfections of the lens elements.
  • FIGS. 6a to 6c show examples of results from a NeRF trained on shallow DoF images.
  • Figures 6a and 6b show a rendered RGB image and its corresponding depth map respectively.
  • An example of an inaccurate depth produced by NeRF when trained with shallow DoF images is shown in Figure 6c.
  • NeRF is trained with shallow DoF images, then this DoF blur is “baked” into the NeRF, i.e. the amount of DoF blur or the focus point cannot be changed after optimizing the NeRF. Therefore, although the rendered image is quite accurate, the NeRF fails to accurately reconstruct the depth map of the blurry background. This is due to the NeRF’s pinhole camera model, which assumes the images to be all-in-focus. To compensate, the NeRF distorts the background depth such that the resulting rays render the RGB image to match the training images.
  • a first category of approaches is synthetic blur and discretized depth plane (i.e. post-processing methods to add blur). These methods typically apply depth-varying blur as a post-process to rendered images and depth maps. Given an RGB image and its corresponding depth map, the RGB image is split into segments based on a number of discretized depth planes, and each plane is convolved with a 2D blur kernel.
  • the blur kernel is usually chosen to be a simple circular or Gaussian blur kernel, where the radius or standard deviation of the blur kernel increases with distance from the chosen focus plane, i.e. the further from the focus point, the more blurry the image and thus the larger the radius of the blur kernel. Since these methods are synthetic, they are not able to reconstruct realistic blur from real lenses. Furthermore, if they are to be incorporated into an end-to-end NeRF framework, the discretization of the depth map into planes can cause training instability.
  • Deblur-NeRF Deblur-NeRF
  • Deblur-NeRF Neural Radiance Fields from Blurry Images
  • CVPR Computer Vision and Pattern Recognition Conference
  • DoF-NeRF DoF-NeRF
  • a final category is thin-lens modelling.
  • Such methods model out-of-focus blur by explicitly computing the geometry of rays passing through a thin-lens to model.
  • These methods are more complex and computationally expensive to train, since many rays need to be rendered to generate out-of-focus blur.
  • They are also trained using synthetically generated blur (for example, Gaussian) due to lack of real paired sharp-blurry data with camera lens parameter labels.
  • These methods are controllable and based on understood lens optics. However, they are still an approximation to real lenses. They are also computationally complex, as many rays need to be rendered.
  • An example of this method is described in Wang et al., "NeRFocus: Neural Radiance Field for 3D Synthetic Defocus", arXiv:2203.05189, 2022.
  • an image processing apparatus for forming an enhanced image, the image processing apparatus being configured to: receive data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image; receive a trained point spread function model comprising a respective learned point spread function for each pixel of the multiple pixels of the input image; modify each learned point spread function in dependence on the respective depth value of the respective pixel to form a respective modified point spread function for each pixel of the multiple pixels of the input image; and apply the respective modified point spread function to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image.
  • the apparatus may be configured to form the enhanced image so as to render depth-of-field blur in the enhanced image. This may allow images with accurate 3D geometry to be rendered.
  • the apparatus may be configured to modify each learned point spread function in dependence on a respective depth mask weight for the respective pixel, each depth mask weight being determined based on a pixel-wise depth difference from a central pixel in the multiple pixels of the input image.
  • the point spread function model described herein is therefore fully spatial- and depth-varying, resulting in a more expressive and realistic model compared to the pinhole camera model used in standard rendering frameworks.
  • Each learned point spread function may comprise a matrix of weights corresponding to the respective pixel and multiple neighboring pixels.
  • Each learned point spread function may be applied to a patch of pixels comprising a central pixel and one or more neighboring pixels. This may allow the rendered blur to be more realistic.
  • Each learned point spread function may be further modifiable in dependence on one or more parameters of an image sensor and camera lens that captured the input image. This can enable the model to be generalizable across scenes, as it is specific to the lens and is sceneagnostic.
  • the one or more parameters may comprise one or more of focal length, focus distance and aperture value. This can allow the point spread functions to be conditioned on such camera parameters.
  • the learned point spread function model may be specific to a particular lens of an image sensor used to capture the input image.
  • the learned point spread functions may vary spatially across the image sensor. This can enable the model to be generalizable across scenes, as it is specific to a lens.
  • the apparatus may be further configured to convert spatial locations in the input image to another coordinate system and apply one or more known properties of an image sensor used to capture the input image to the converted spatial locations. Once the spatial locations have been converted, the apparatus may be configured to apply prior knowledge of an image sensor of the camera to aid with training. For example, the coordinate transform module may apply prior knowledge that the sensor has symmetric properties, thus reducing the learnable space. This may improve the efficiency of training.
  • the trained point spread function model may be scene-agnostic. This may allow the approach to be generalizable across different scenes.
  • the received data may be an output of an image rendering model. This may allow the learned point spread function model to be used in a rendering framework, such as a neural rendering pipeline.
  • the input image may correspond to a novel view output from the image rendering model.
  • the image rendering model may be, for example a Neural Radiance Fields (NeRF) model.
  • NeRF Neural Radiance Fields
  • the trained point spread function model may be a multi-layer perceptron neural network. This may allow the model to represent a neural field.
  • the trained spread point function model may be trained end-to-end with the image rendering model.
  • trained end-to-end within a NeRF framework a sharp all-in-focus internal representation of the 3D scene is learnt, which when rendered using standard volume rendering techniques can enable controllable depth-of-field blur given novel camera parameters.
  • the trained point spread model may be trained using paired sharp-blurry images with labelled lens parameters. This may allow the model to learn taking into account the lens parameters. The ambiguity between aperture and focus distances can be mitigated when the point spread function model is pre-trained with labelled paired data.
  • a method for forming an enhanced image comprising: receiving data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image; receiving a trained point spread function model comprising a respective learned point spread function for each pixel of the multiple pixels of the input image; modifying each learned point spread function in dependence on the respective depth value of the respective pixel to form a modified point spread function for each pixel of the multiple pixels of the input image; and applying the respective modified point spread function to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image.
  • a computer-readable storage medium having stored thereon computer-readable instructions that, when executed at a computer system, cause the computer system to perform the steps set out above.
  • the computer system may comprise the one or more processors.
  • the computer-readable storage medium may be a non- transitory computer-readable storage medium.
  • Figure 1 schematically illustrates Neural Radiance Fields (NeRF), which can enable the rendering of high-quality images and depth maps of novel views of a scene.
  • NeRF Neural Radiance Fields
  • Figure 2a schematically illustrates a pinhole camera model.
  • Figure 2b schematically illustrates a complex multi-lens design.
  • Figure 3a illustrates images captured with lenses operating with large apertures and high focal lengths, which inherently have a shallow depth-of-field resulting in photos with progressively- blurry backgrounds.
  • Figure 3b shows an image captured using a portrait mode on a smartphone, which can simulate background blur synthetically.
  • Figure 4a shows examples of real lens point spread functions, with spatial variation across an image sensor [from Bauer et al. 2018],
  • FIG. 4b shows examples of real lens point spread functions (PSF) for different apertures (f values) [from Bauer et al. 2018],
  • Figure 5a schematically illustrates how PSFs typically have a hard edge due to rays being occluded by the physical lens aperture.
  • Figure 5b illustrates examples of other variations in lens blur.
  • Figures 6a to 6c show results from a NeRF trained on shallow depth-of-field images: Figure 6a is a rendered RGB image, Figure 6b is a foreground depth map, and Figure 6c is a background depth map.
  • Figure 7a schematically illustrates an artificial intelligence-based lens modelling neural rendering framework.
  • Figure 7b schematically illustrates one implementation of an end-to-end rendering pipeline shown in greater detail.
  • Figure 8 schematically illustrates a network design for a point spread function neural field model.
  • Figure 9 shows an example of a transformation of sensor coordinates from Cartesian to Polar coordinates.
  • Figure 10 schematically illustrates an example of a point spread function modulation function.
  • Figure 11 schematically illustrates the application of a soft weight mask which uses a continuous weighting function.
  • Figure 12 schematically illustrates an example of a point spread function application module.
  • Figure 13 shows the steps of a method of forming an enhanced image in accordance with embodiments of the present invention.
  • Figure 14 schematically illustrates an image processing apparatus in accordance with embodiments of the present invention and some of its associated components.
  • Figures 15a-15c show examples of results on synthetic data.
  • Figure 15a shows an all-in-focus input image
  • Figure 15b shows a ground truth blurry images
  • Figure 15c shows a predicted blurry image with learnt point spread function model.
  • Figures 16a-16c show examples of results obtained by incorporating a learned lens point spread function model into the NeRF framework and training end-to-end.
  • Figure 16a shows a ground truth sharp image
  • Figure 16b shows a blurry input image
  • Figure 16c shows a recovered sharp image.
  • Figures 17a-17e shows some further qualitative results on real world data.
  • Figure 17a shows the all-in-focus input
  • Figure 17b shows the ground truth blur
  • Figure 17c shows the result of fitting the PSF blur model to a real 3D scene captured with the all-in-focus and blurry image pair of Figures 17a and 17b.
  • Figures 17d and 17e show novel views which can be rendered with the learned PSF blur, and this blur can be modified by controlling the blur kernels.
  • Embodiments of the present invention introduce an artificial intelligence (Al) system implementing a learnable lens model that can learn point spread functions (also referred to as kernels).
  • the PSFs can be learned, for example, using neural rendering from multi-view images and depth maps in an image rendering framework.
  • Camera lens-specific PSFs can be learned, which can allow images to be rendered with realistic out-of-focus blur. This can allow for reconstruction and synthetisation of images via neural rendering with shallow depth-of-field input images, which can particularly address the difficulties in reconstructing accurate 3D geometry.
  • Figure 7a shows an example of an image rendering pipeline 700.
  • the pipeline comprises a number of separate modules, including a differential rendering model 705, a volume rendering module 707 and a PSF model 711.
  • the PSF model 711 can be inserted into the rendering pipeline 700, as shown schematically in Figure 7a, and can enable the pipeline to both reconstruct and render image blur.
  • the differential rendering model 705 can be trained separately to the PSF model 711 using any suitable known method, or the PSF model 711 may be trained end-to-end with the differential rendering model 705, as will be described later.
  • the pipeline 700 processes multi-view input images 701 of a scene, which may be sharp and/or have a shallow depth-of-field, with known camera parameters (such as aperture, focus and distance).
  • the pipeline 700 reconstructs these images using neural rendering to output rendered blur images 717.
  • the rendered blur images 717 may each have a novel view compared to the input images 701.
  • the pipeline 700 can perform one or more operations on the inputs 701 including ray casting 702, point sampling 703 and frequency encoding 704 before inputting data derived from the input images 701 to the differential rendering model 705.
  • the output of the differential rendering model 705 is (c fc , cr fe ) , where c is colour and is a density for a 3D space coordinate k, which is input to the volume rendering module 707 to output colour, C(x,y), and depth, Depth(x,y), values for each pixel of the input image at a pixel location (x, y) of the image sensor, as shown at 708.
  • a patch sampler 709 can sample patches of pixels TV' (x,y)). Each patch may comprise a central pixel and multiple neighbouring pixels.
  • multi-view image training data For training the PSF model 711 for a particular lens, multi-view image training data can be used.
  • the data to train the PSF lens model 711 preferably comprises paired multi-view image data of a number of 3D scenes, comprising sharp all-in-focus images and corresponding images with depth-of-field blur.
  • the training data may advantageously span a number of different lens apertures (for example, f1.8 to f16) and focus distances (with camera parameter labels, for example extracted from the camera's Exchangeable Image File Format (EXIF)), as well as encompassing a range of different scene depths.
  • the captured paired image data with camera parameter labels and depth maps can therefore be used to train the lens-specific PSF model 711.
  • a neural rendering pipeline can be trained to render sharp ground truth depth maps by comparing the output with sharp ground truth depth maps. As all modules are differentiable, this enables end-to-end training of the complete system.
  • a NeRF pipeline is trained end-to-end with the PSF lens model.
  • the learnable PSF lens model 711 is trained to learn PSF kernel weights Kt] 712 for each location on an image sensor of the camera (i.e. for each pixel, and is optionally also applied across one or more neighbouring pixels in a patch JV(x,y)), given depth at each location, camera lens focal length, focus distance and lens aperture for the lens, as shown at 718.
  • the PSF model 711 for a lens can be fixed and used together with any neural rendering method for end-to-end reconstruction and novel-view rendering of new scenes.
  • the coordinate transform module shown at 710 converts image sensor pixel locations (x,y) to another coordinate system (e.g. polar coordinates) and can apply sensor priors to aid with training of the Al system.
  • the coordinate transform module may apply prior knowledge that the sensor has symmetric properties, thus reducing the learnable space to one half or one quarter of the complete space accordingly.
  • the pipeline also comprises a PSF depth modulation mechanism, which determines a function 713 that modulates PSF kernels Kt] 712 with continuous depth mask weights 714 based on pixel-wise depth differences from a central pixel in a patch JV(x,y). Therefore, the continuous depth mask weights are a function that modulates PSF kernels based on pixelwise relative depth differences. This is used to prevent blurring across parts of the image at different depths.
  • the PSF application module shown at 716 is a mechanism of applying the transformations of spatial and depth-varying point spread functions to rendered images. This module applies the learnt PSF kernel to the RGB colour values C(x,y) of a pixel or a patch of pixels centred on a particular sensor location (x,y), modulated by the continuous depth mask, to generate images 717 rendered with depth-of-field blur.
  • FIG. 7b A more detailed embodiment of the pipeline is schematically illustrated in Figure 7b.
  • the pipeline 800 can perform one or more operations on each of the input images 801 including ray casting 802, point sampling 803 and frequency encoding 804 before inputting data derived from the input images to the differential rendering model 805, which in this example is a NeRF model.
  • the NeRF model 805 Given a set of 2D images of a 3D scene with known camera poses (for example, specifying focal length, rotation and translation), the NeRF model 805 can learn an implicit mapping from spatial coordinates (x, y, z) to volume density and view-dependent colour (RGB).
  • the output of the NeRF model 805 is (Cj, o-j), shown at 806.
  • the volume rendering module 807 accumulates samples of each scene representation along camera rays to render the scene from any viewpoint.
  • the output of the volume rendering module 807 is colour, C(x,y), and depth, Depth(x,y), values for each pixel of the input image at a pixel location (x,y) of the image sensor, as shown at 808.
  • the patch sampler 809 operates as for the patch sampler 709 of Figure 7a.
  • the coordinate transform module 810 converts image sensor pixel locations (x, y) from Cartesian coordinates to Polar coordinates and can apply sensor priors, as described above.
  • the output of the coordinate transform module is frequency encoded at 811 .
  • the PSF model 812 in this example is a multi-layer perceptron (MLP) model.
  • the learnable PSF lens model 812 is trained to learn PSF kernel weights Kt] 813 for each location on an image sensor of the camera (i.e. for each pixel and optionally one or more neighbouring pixels in a patch J ⁇ T(x,y)), given depth at each location, camera lens focal length, focus distance and lens aperture for the lens, as shown at 819.
  • these parameters are frequency modulated at 820 before being input to the model 812.
  • the PSF kernels Kt] 813 output from the PSF model 812 are modulated with continuous depth mask weights 815 based on pixel-wise depth differences from a central pixel in a patch N(x,y) from a continuous depth mask, shown at 814, which in this example uses a Gaussian function.
  • this modifies each learned point spread function in dependence on the respective depth value of the respective pixel to form a modified point spread function for each pixel. This results in a spatial and depth varying kernel at 816.
  • the transformations of spatial and depth-varying point spread functions are then applied to rendered images.
  • this is performed using a convolution (dot product), as will be described in more detail with reference to Figure 12.
  • the module 817 applies the learnt PSF kernel to the RGB colour values C(x,y) of a pixel or a patch of pixels centred on a particular sensor location (x,y), modulated by the continuous depth mask, to generate images 818 rendered with depth-of-field blur.
  • the learnable PSF model for a particular camera lens learns to generate PSFs for each pixel.
  • the PSFs may also be termed blur kernels.
  • the PSF for a respective pixel may be applied to the respective pixel and one or more neighbouring pixels of a patch comprising the respective pixel.
  • the PSF for a respective pixel may be applied to a patch of 9 pixels, 16 pixels, 25 pixels, or so on (with the respective pixel at the center of the patch). This may be performed for all pixels of the image.
  • the PSF model is a neural field represented by a 3-layer fully connected multilayer perceptron neural network (MLP) 807 to represent the neural field.
  • MLP multilayer perceptron neural network
  • the model takes as input a sensor location (x,y) and scene depth d(x,y) at that location and outputs the corresponding blur kernel weights k t j.
  • Parameters 806 of a camera 805 can also be fed into the PSF model 807.
  • all inputs to the MLP 807 are encoded to a higher dimensionality using a coordinate transform 803 (for example, from Cartesian to Polar coordinates) and frequency encoding 804, following the standard neural rendering procedure.
  • This frequency encoding can increase the input dimensionality.
  • Alternative neural network architectures may be used.
  • the output of the MLP model 807 is the PSF blur kernel weights.
  • the output is a kernel 808 in vector form, which is reshaped into an s x s matrix 809 containing s 2 elements. 810 is an example of how such a point spread function may look for real data.
  • the model 807 is fully differentiable so can be trained either separately (pretrained), or end-to-end within a neural rendering framework.
  • the PSF modulation mechanism implements a function that generates a continuous weight mask based on the respective depths of each pixel of a patch of pixels to modulate the learned PSF kernel.
  • This mechanism can therefore output a soft weight mask for a patch of pixels to prevent blurring of the PSF kernels across parts of the scene at different depths, instead of splitting the depth into discontinuous planes.
  • a Gaussian function is used to produce weights Wy for a patch of pixels 1000 based on the depth difference of a respective pixel 1001 of the patch 1000 from the centre pixel 1002 of the patch of pixels (i.e. di'j - dtj, where d is depth).
  • the weight mask is used to prevent blurring across parts of scene at different depths by applying the weights to the kernel values of a patch of pixels.
  • the hyperparameter of the Gaussian’s standard deviation a can be pre-defined and fixed, or could be a learnt parameter.
  • the function could be represented by a neural network.
  • this mechanism is equivalent to splitting the scene into depth planes, but instead of discretized planes, the soft weight depth mask transforms this into a continuous weighting function for each patch 1-4 shown in the figure, which is better for optimization purposes and model expressivity.
  • the PSF application module applies the corresponding PSF at each sensor location (i.e. for each respective pixel and one or more neighbouring pixels in a patch comprising a respective pixel at the center of the patch) to produce the final RGB output image C with rendered depth-of-field blur.
  • This module generates the final image pixel values with rendered depth-of-field blur. It fuses a patch of rendered colour pixels, with each pixel having a respective colour value c ⁇ , the learned PSF kernel at that sensor location, and the modulating weight mask produced by the PSF modulation mechanism.
  • a simple dot product of the colour values with kernel weights and PSF modulation weights, summed over a patch is performed.
  • Other implementations may alternatively be used, for example using a dictionary of learned kernels and applying them in the fast Fourier Transform (FFT) domain.
  • FFT fast Fourier Transform
  • the loss (L1 loss) between the ground truth image I and the predicted image / can be minimized according to the following loss function:
  • Figure 13 shows the steps of an exemplary method for forming an enhanced image.
  • the method comprises receiving data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image.
  • the data for the input image may comprise respective colour and depth values for every pixel of the input image.
  • the method comprises receiving a trained point spread function model comprising a respective learned point spread function for each pixel of the multiple pixels of the input image.
  • the method comprises modifying each learned point spread function in dependence on the respective depth value of the respective pixel to form a respective modified point spread function for each pixel of the multiple pixels of the input image.
  • the method comprises applying the respective modified point spread function to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image.
  • the respective modified point spread function may be applied to a patch of pixels centred on the respective pixel. This may be performed for each pixel of the multiple pixels of the input image. This may be performed for every pixel of the input image to form the enhanced image.
  • This method may be implemented on-device (for example, on a smartphone) or externally, such as on cloud-based services.
  • FIG 14 shows an example of an image processing apparatus for performing the methods described herein.
  • the apparatus 1400 may comprise at least one processor, such as processors 1401 , and at least one memory, such as memory 1402.
  • the memory stores in a non-transient way code that is executable by the processor(s) to implement the apparatus in the manner described herein.
  • the apparatus may also comprise one of more image sensors 1403 configured to capture an image which can then be input to a rendering pipeline implemented by the processor 1401 or used to train the rendering pipeline in the manner described herein.
  • This Al system can enable depth-of-field blur to be rendered in images rendered by a novel view synthesis method such as NeRF.
  • the PSF lens blur model is represented by a neural field, which can be conditioned on sensor location, depth and captured camera parameters (such as focal length, focus distance and aperture).
  • the depth weighting mechanism modulates the learned blur kernels to prevent blurring across parts of the scene at significantly different depths.
  • the learnt lens blur kernels and depth weighting are then applied to image patches rendered by a neural rendering framework to generate images with depth-of-field blur.
  • the implementation of the learned PSF model in the rendering pipeline can lead to a more expressive camera model, capable of generating images with realistic depth-of-field blur, rather than the sharp all-in-focus images normally generated by standard NeRF models.
  • the PSF model can also lead to better geometry (depth) reconstruction when the learnt lensspecific PSF model is incorporated into a NeRF framework and trained end-to-end. This is because the NeRF has to update the depth map to produce sharp colours before the blur model is applied.
  • lens-specific (scene-agnostic) blur can be learned from real images.
  • the learned PSFs are generalizable across scenes, and novel views can be rendered and the blur controlled a priori.
  • This approach therefore provides for a controllable and learnable system that generates DoF blur by taking colour and depth values from volumetric rendering and applies point spread functions.
  • the neural field can learn spatially-varying kernel weights Kt] based on real camera parameters (focal length, focus distance, aperture value), transformed sensor coordinates, and depth values.
  • the present approach can enable learning of spatial- and depth-varying PSF kernels for each location on the sensor, leading to the realistic reconstruction of images containing DoF blur.
  • Transforming sensor coordinates e.g. polar coordinates in a quotient space
  • Conditioning the neural field on camera parameters enables view synthesis with novel camera parameters.
  • the continuous depth mask provides for a smooth function modulates the PSF kernels based on pixel-wise depth differences, preventing blurring across large depth discontinuities and avoiding problems associated with discretized depth planes. This can enable the application of learnt PSF kernels to colours and depth provided by neural rendering, resulting in renders with realistic DoF blur.
  • Figures 15a-15c show qualitative results of fitting a lens PSF model to a synthetic 3D scene.
  • the present model is able to render the out-of-focus blur accurately, with a PSNR of ⁇ 50dB when compared to the ground truth blurry image.
  • Figure 15a shows an all-in-focus input image
  • Figure 15b shows a ground truth blurry images
  • Figure 15c shows a predicted blurry image obtained using the present approach, with an example of a PSF blur kernel learned through optimization of the model.
  • Figures 17a-17e shows some further qualitative results on real world data.
  • Figure 17a shows the all-in-focus input
  • Figure 17b shows the ground truth blur
  • Figure 17c shows the result of fitting the PSF blur model to a real 3D scene captured with the all-in-focus and blurry image pair of Figures 17a and 17b.
  • Figures 17d and 17e show novel views which can be rendered with the learned PSF blur, and this blur can be modified by controlling the blur kernels.
  • the lens model is conditioned on real camera parameters and thus enables rendering of images with novel camera parameters.
  • the ambiguity between aperture and focus distance is mitigated since the lens PSF model is pretrained with labelled paired data, i.e. the lens model is conditioned on camera parameters.
  • the PSF lens model can be pre-trained, for example in the lab, using paired sharp and blurry images with known camera parameters as input. This enables the lens model to be generalizable across scenes, as it is specific to the lens and is scene-agnostic.
  • the trained point spread function model can advantageously be learned based on camera parameters of an image sensor.
  • the model is therefore also controllable and conditioned on lens parameters, enabling novel views to be rendered with novel camera parameters, facilitating the depth-of-field to be changed after capture and NeRF reconstruction (whereas in most prior NeRF methods, the depth-of-field is baked into the NeRF model and cannot be changed).
  • the PSF model can advantageously be conditioned on known camera settings (such as focus distance and aperture from camera’s EXIF data) which are fed into the model, this enables easy control of the blur by changing in the input camera parameters. Therefore, the method can learn to disambiguate between focus distance and aperture value and enables novel view synthesis with new camera parameters (such as aperture and focus distance).
  • Embodiments of the present invention can therefore provide a general lens model that is scene-agnostic (and camera and lens dependent instead) and thus generalizable across different scenes.
  • the PSF model described herein can therefore learn arbitrary blur PSF kernels from real data and is fully spatial- and depth-varying, resulting in a more expressive and realistic model compared to the pinhole camera model used in standard NeRF frameworks.
  • a PSF model for a camera lens into a neural rendering framework, the behaviour of real lenses can be better modelled, and images can be reconstructed with realistic depth-of-field blur.
  • a sharp all-in-focus internal representation of the 3D scene is learnt, which when rendered using standard volume rendering techniques, enables controllable depth-of-field blur given novel camera parameters.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

An image processing apparatus (1400) for forming an enhanced image (717, 818), the apparatus (1400) being configured to: receive (1301) data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels; receive (1302) a trained point spread function model (711, 812) comprising a respective learned point spread function (712, 813) for each pixel of the multiple pixels of the input image; modify (1303) each learned point spread function (712, 813) in dependence on the respective depth value of the respective pixel to form a modified point spread function (715, 816) for each pixel of the multiple pixels of the input image; and apply (1304) the respective modified point spread function (715, 816) to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image (717, 818). By incorporating a point spread function model, the behaviour of real lenses can be better modelled and images can be reconstructed with realistic depth-of-field blur.

Description

LEARNABLE POINT SPREAD FUNCTIONS FOR IMAGE RENDERING
FIELD OF THE INVENTION
This disclosure relates to image processing, in particular to rendering images.
BACKGROUND
Neural Radiance Fields (NeRF), as described in B. Mildenhall et al., “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”, The 2020 European Conference on Computer Vision (ECCV 2020), can enable the rendering of high-quality images and depth maps of novel views of a three-dimensional (3D) scene. Given a set of two-dimensional (2D) images of a 3D scene with known camera poses (for example, focal length, rotation and translation), NeRF can learn an implicit mapping from spatial coordinates (x, y, z) to volume density and view-dependent colour (RGB). A process known as volume rendering accumulates samples of this scene representation along camera rays to render the scene from any viewpoint.
Figure 1 schematically illustrates a visualization of a typical NeRF framework. NeRF enables the rendering of high-quality images and depth maps of novel views of a scene. Multi-view input images of a 3D scene, along with their corresponding camera poses, are processed by the NeRF optimization framework, learning a scene representation enabling novel views (RGB and depth) of the scene to be rendered.
NeRF can obtain impressive results for the task of novel view synthesis, but relies on a simplified pinhole camera model, which is a theoretical lens-less model where all rays pass through a single point known as the camera’s optical centre, as shown in Figure 2a. Under such lens-less model assumptions, all rays pass through a single point, which results in images that are entirely in focus (i.e. all parts of the images are equally sharp). However, this is unrealistic, since real cameras typically use complex multi-lens systems, with one or more moving groups designed for, for example, different focal lengths, as shown schematically in Figure 2b. This can result in a complex optical response governed by physical properties such as aperture, focus distance, focal length and depth. Furthermore, all real lenses will exhibit some amount of depth-of-field blur (DoF) and will never be completely in focus.
In non-pinhole camera models, the spread of rays onto the image sensor creates image blur in out-of-focus regions. This shallow depth-of-field (DoF) blur is often exploited for aesthetic photographic or video purposes, such as to focus the viewer's attention on a particular subject or isolate the subject from the background, to create a more "cinematic" large sensor or aperture look, or to create aesthetically pleasing "bokeh" effects, as shown in Figures 3a and 3b. As illustrated in Figure 3a, lenses operating with large apertures and high focal lengths inherently have a shallow depth-of-field resulting in photos with progressively-blurry backgrounds. The wider the aperture (the smaller the f-stop) the greater the out-of-focus blur. As illustrated in Figure 3b, portrait mode on smartphones can simulate background blur synthetically, but such modes are often error-prone and fail to consider spatially varying behaviour.
This depth-of-field blur can be modelled by a point spread function (PSF), which describes the response of a focused optical imaging system to a point source or point object. A more general term for the PSF is the system's impulse response; the PSF is the impulse response or impulse response function (IRF) of a focused optical imaging system. PSFs are camera and lensspecific and vary spatially across the image sensor. The PSF of a lens system depends on a number of factors, such as lens aperture, focal length, focus distance, sensor position and scene depth.
Figure 4a illustrates an example of real lens PSFs as measured scientifically in the lab using specialist equipment. The PSFs are shown for a number of locations across the image sensor. Figure 4b shows a close up visualization of the PSFs at the corner of the sensor and in the centre for different aperture values: f2.8, f4 and f5.6, as described in M. Bauer et al., “Automatic Estimation of Modular Transfer Functions”, 2018 IEEE International Conference on Computational Photography (ICCP), 1-12 [Bauer et al. 2018], As the size of the aperture is increased, and towards the corner of the image sensor, the size of the PSF blur increases.
PSFs typically have a sharp edge, as the lens aperture occludes rays, as schematically illustrated in Figure 5a. Figure 5b illustrates other variations in lens blur, for example caused by vignetting of the lens barrel and imperfections of the lens elements.
Due to the simplified pinhole camera model used by most neural rendering frameworks, such frameworks cannot handle input images containing shallow DoF blur, since the model inherently assumes all-in-focus input images. To be able to reconstruct images containing DoF blur, when trained with shallow DoF input images, neural rendering approaches distort the learnt geometry (depth maps) such that rays rendered from the scene representation generate blur matching the training images. The typically resulting poor quality geometry reconstruction degrades the ability to perform novel view synthesis. Figures 6a to 6c show examples of results from a NeRF trained on shallow DoF images. Figures 6a and 6b show a rendered RGB image and its corresponding depth map respectively. An example of an inaccurate depth produced by NeRF when trained with shallow DoF images is shown in Figure 6c. If NeRF is trained with shallow DoF images, then this DoF blur is “baked” into the NeRF, i.e. the amount of DoF blur or the focus point cannot be changed after optimizing the NeRF. Therefore, although the rendered image is quite accurate, the NeRF fails to accurately reconstruct the depth map of the blurry background. This is due to the NeRF’s pinhole camera model, which assumes the images to be all-in-focus. To compensate, the NeRF distorts the background depth such that the resulting rays render the RGB image to match the training images.
Several methods have been developed to try to overcome the above issues. A first category of approaches is synthetic blur and discretized depth plane (i.e. post-processing methods to add blur). These methods typically apply depth-varying blur as a post-process to rendered images and depth maps. Given an RGB image and its corresponding depth map, the RGB image is split into segments based on a number of discretized depth planes, and each plane is convolved with a 2D blur kernel. The blur kernel is usually chosen to be a simple circular or Gaussian blur kernel, where the radius or standard deviation of the blur kernel increases with distance from the chosen focus plane, i.e. the further from the focus point, the more blurry the image and thus the larger the radius of the blur kernel. Since these methods are synthetic, they are not able to reconstruct realistic blur from real lenses. Furthermore, if they are to be incorporated into an end-to-end NeRF framework, the discretization of the depth map into planes can cause training instability.
These approaches generally have low computational complexity and blur is easily controllable a priori, using a simple algorithm. However, as mentioned above, the blur is generally not realistic (not learned from data and is synthetically generated a priori). These methods assume all-in-focus input images, which may not be available. Blur modelling is a post-process (does not affect NeRF reconstruction). Discretized depth planes can also cause issues, such as haloing around objects, discontinuities can affect training stability, and deciding how many planes to use. Examples of these methods are described in B. Mildenhall et al., “NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images”, Computer Vision and Pattern Recognition Conference (CVPR) 2022 and Busam et al., “Sterefo: Efficient Image Refocusing with Stereo Vision, International Conference on Computer Vision Workshops (ICCVW) 2019. Another category of prior approaches involve using blur models that fit to the scene. This family of algorithms provide a mechanism to blur the observations. However, this blur is overfitted to the scene and is thus non-generalizable. The models are learnt from real data and are usually more expressive, allowing them to better reconstruct real blur. However, they are generally not easily controllable. The blur model is fitted to each scene separately and is not generalizable (i.e. it is a model of the scene, not camera plus lens). Also, most methods cannot handle large blur sizes. These models are usually learnt in a self-supervised way, therefore fail with consistent blur (when all training images are blurred in the same way). They also cannot disentangle focus distance and aperture. Some examples of these methods are described in Deblur-NeRF, as described in Li et al., “Deblur-NeRF: Neural Radiance Fields from Blurry Images", Computer Vision and Pattern Recognition Conference (CVPR) 2022, and DoF-NeRF, as described in Wu et al., “DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields”, ACM International Conference on Multimedia (ACMMM) 2022.
A final category is thin-lens modelling. Such methods model out-of-focus blur by explicitly computing the geometry of rays passing through a thin-lens to model. These methods are more complex and computationally expensive to train, since many rays need to be rendered to generate out-of-focus blur. They are also trained using synthetically generated blur (for example, Gaussian) due to lack of real paired sharp-blurry data with camera lens parameter labels. These methods are controllable and based on understood lens optics. However, they are still an approximation to real lenses. They are also computationally complex, as many rays need to be rendered. An example of this method is described in Wang et al., "NeRFocus: Neural Radiance Field for 3D Synthetic Defocus", arXiv:2203.05189, 2022.
It is desirable to develop a method that can overcome at least some of the above issues.
SUMMARY OF THE INVENTION
According to one aspect, there is provided an image processing apparatus for forming an enhanced image, the image processing apparatus being configured to: receive data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image; receive a trained point spread function model comprising a respective learned point spread function for each pixel of the multiple pixels of the input image; modify each learned point spread function in dependence on the respective depth value of the respective pixel to form a respective modified point spread function for each pixel of the multiple pixels of the input image; and apply the respective modified point spread function to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image. By incorporating a point spread function model for a camera lens into a rendering framework, the behaviour of real lenses can be better modelled, and images can be reconstructed with realistic depth-of-field blur. This can allow for reconstruction and synthesisation of images via neural rendering with shallow depth-of-field input images, which can particularly address the difficulties in reconstructing accurate 3D geometry.
The apparatus may be configured to form the enhanced image so as to render depth-of-field blur in the enhanced image. This may allow images with accurate 3D geometry to be rendered.
The apparatus may be configured to modify each learned point spread function in dependence on a respective depth mask weight for the respective pixel, each depth mask weight being determined based on a pixel-wise depth difference from a central pixel in the multiple pixels of the input image. The point spread function model described herein is therefore fully spatial- and depth-varying, resulting in a more expressive and realistic model compared to the pinhole camera model used in standard rendering frameworks.
Each learned point spread function may comprise a matrix of weights corresponding to the respective pixel and multiple neighboring pixels. Each learned point spread function may be applied to a patch of pixels comprising a central pixel and one or more neighboring pixels. This may allow the rendered blur to be more realistic.
Each learned point spread function may be further modifiable in dependence on one or more parameters of an image sensor and camera lens that captured the input image. This can enable the model to be generalizable across scenes, as it is specific to the lens and is sceneagnostic.
The one or more parameters may comprise one or more of focal length, focus distance and aperture value. This can allow the point spread functions to be conditioned on such camera parameters.
The learned point spread function model may be specific to a particular lens of an image sensor used to capture the input image. The learned point spread functions may vary spatially across the image sensor. This can enable the model to be generalizable across scenes, as it is specific to a lens. The apparatus may be further configured to convert spatial locations in the input image to another coordinate system and apply one or more known properties of an image sensor used to capture the input image to the converted spatial locations. Once the spatial locations have been converted, the apparatus may be configured to apply prior knowledge of an image sensor of the camera to aid with training. For example, the coordinate transform module may apply prior knowledge that the sensor has symmetric properties, thus reducing the learnable space. This may improve the efficiency of training.
The trained point spread function model may be scene-agnostic. This may allow the approach to be generalizable across different scenes.
The received data may be an output of an image rendering model. This may allow the learned point spread function model to be used in a rendering framework, such as a neural rendering pipeline.
The input image may correspond to a novel view output from the image rendering model. The image rendering model may be, for example a Neural Radiance Fields (NeRF) model.
The trained point spread function model may be a multi-layer perceptron neural network. This may allow the model to represent a neural field.
The trained spread point function model may be trained end-to-end with the image rendering model. When trained end-to-end within a NeRF framework, a sharp all-in-focus internal representation of the 3D scene is learnt, which when rendered using standard volume rendering techniques can enable controllable depth-of-field blur given novel camera parameters.
The trained point spread model may be trained using paired sharp-blurry images with labelled lens parameters. This may allow the model to learn taking into account the lens parameters. The ambiguity between aperture and focus distances can be mitigated when the point spread function model is pre-trained with labelled paired data.
According to a second aspect there is provided a method for forming an enhanced image, the method comprising: receiving data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image; receiving a trained point spread function model comprising a respective learned point spread function for each pixel of the multiple pixels of the input image; modifying each learned point spread function in dependence on the respective depth value of the respective pixel to form a modified point spread function for each pixel of the multiple pixels of the input image; and applying the respective modified point spread function to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image.
By incorporating the use of a point spread function model for a camera lens into a rendering method, the behaviour of real lenses can be better modelled, and images can be reconstructed with realistic depth-of-field blur. This can allow for reconstruction and synthesisation of images via neural rendering with shallow depth-of-field input images, which can particularly address the difficulties in reconstructing accurate 3D geometry.
According to a further aspect, there is provided a computer-readable storage medium having stored thereon computer-readable instructions that, when executed at a computer system, cause the computer system to perform the steps set out above. The computer system may comprise the one or more processors. The computer-readable storage medium may be a non- transitory computer-readable storage medium.
BRIEF DESCRIPTION OF THE FIGURES
The present disclosure will now be described by way of example with reference to the accompanying drawings. In the drawings:
Figure 1 schematically illustrates Neural Radiance Fields (NeRF), which can enable the rendering of high-quality images and depth maps of novel views of a scene.
Figure 2a schematically illustrates a pinhole camera model.
Figure 2b schematically illustrates a complex multi-lens design.
Figure 3a illustrates images captured with lenses operating with large apertures and high focal lengths, which inherently have a shallow depth-of-field resulting in photos with progressively- blurry backgrounds. The wider the aperture (the smaller the f-stop) the greater the out-of-focus blur.
Figure 3b shows an image captured using a portrait mode on a smartphone, which can simulate background blur synthetically.
Figure 4a shows examples of real lens point spread functions, with spatial variation across an image sensor [from Bauer et al. 2018],
Figure 4b shows examples of real lens point spread functions (PSF) for different apertures (f values) [from Bauer et al. 2018],
Figure 5a schematically illustrates how PSFs typically have a hard edge due to rays being occluded by the physical lens aperture.
Figure 5b illustrates examples of other variations in lens blur. Figures 6a to 6c show results from a NeRF trained on shallow depth-of-field images: Figure 6a is a rendered RGB image, Figure 6b is a foreground depth map, and Figure 6c is a background depth map.
Figure 7a schematically illustrates an artificial intelligence-based lens modelling neural rendering framework.
Figure 7b schematically illustrates one implementation of an end-to-end rendering pipeline shown in greater detail.
Figure 8 schematically illustrates a network design for a point spread function neural field model.
Figure 9 shows an example of a transformation of sensor coordinates from Cartesian to Polar coordinates.
Figure 10 schematically illustrates an example of a point spread function modulation function. Figure 11 schematically illustrates the application of a soft weight mask which uses a continuous weighting function.
Figure 12 schematically illustrates an example of a point spread function application module.
Figure 13 shows the steps of a method of forming an enhanced image in accordance with embodiments of the present invention.
Figure 14 schematically illustrates an image processing apparatus in accordance with embodiments of the present invention and some of its associated components.
Figures 15a-15c show examples of results on synthetic data. Figure 15a shows an all-in-focus input image, Figure 15b shows a ground truth blurry images and Figure 15c shows a predicted blurry image with learnt point spread function model.
Figures 16a-16c show examples of results obtained by incorporating a learned lens point spread function model into the NeRF framework and training end-to-end. Figure 16a shows a ground truth sharp image, Figure 16b shows a blurry input image and Figure 16c shows a recovered sharp image.
Figures 17a-17e shows some further qualitative results on real world data. Figure 17a shows the all-in-focus input, Figure 17b shows the ground truth blur, and Figure 17c shows the result of fitting the PSF blur model to a real 3D scene captured with the all-in-focus and blurry image pair of Figures 17a and 17b. Figures 17d and 17e show novel views which can be rendered with the learned PSF blur, and this blur can be modified by controlling the blur kernels.
DETAILED DESCRIPTION
Embodiments of the present invention introduce an artificial intelligence (Al) system implementing a learnable lens model that can learn point spread functions (also referred to as kernels). The PSFs can be learned, for example, using neural rendering from multi-view images and depth maps in an image rendering framework. Camera lens-specific PSFs can be learned, which can allow images to be rendered with realistic out-of-focus blur. This can allow for reconstruction and synthetisation of images via neural rendering with shallow depth-of-field input images, which can particularly address the difficulties in reconstructing accurate 3D geometry.
Figure 7a shows an example of an image rendering pipeline 700. The pipeline comprises a number of separate modules, including a differential rendering model 705, a volume rendering module 707 and a PSF model 711. The PSF model 711 can be inserted into the rendering pipeline 700, as shown schematically in Figure 7a, and can enable the pipeline to both reconstruct and render image blur. The differential rendering model 705 can be trained separately to the PSF model 711 using any suitable known method, or the PSF model 711 may be trained end-to-end with the differential rendering model 705, as will be described later.
In this example, the pipeline 700 processes multi-view input images 701 of a scene, which may be sharp and/or have a shallow depth-of-field, with known camera parameters (such as aperture, focus and distance). The pipeline 700 reconstructs these images using neural rendering to output rendered blur images 717. The rendered blur images 717 may each have a novel view compared to the input images 701. The pipeline 700 can perform one or more operations on the inputs 701 including ray casting 702, point sampling 703 and frequency encoding 704 before inputting data derived from the input images 701 to the differential rendering model 705.
As shown at 706, the output of the differential rendering model 705 is (cfc, crfe) , where c is colour and is a density for a 3D space coordinate k, which is input to the volume rendering module 707 to output colour, C(x,y), and depth, Depth(x,y), values for each pixel of the input image at a pixel location (x, y) of the image sensor, as shown at 708. A patch sampler 709 can sample patches of pixels TV' (x,y)). Each patch may comprise a central pixel and multiple neighbouring pixels.
For training the PSF model 711 for a particular lens, multi-view image training data can be used. The data to train the PSF lens model 711 preferably comprises paired multi-view image data of a number of 3D scenes, comprising sharp all-in-focus images and corresponding images with depth-of-field blur. The training data may advantageously span a number of different lens apertures (for example, f1.8 to f16) and focus distances (with camera parameter labels, for example extracted from the camera's Exchangeable Image File Format (EXIF)), as well as encompassing a range of different scene depths. The captured paired image data with camera parameter labels and depth maps can therefore be used to train the lens-specific PSF model 711.
Using the sharp all-in-focus images, a neural rendering pipeline can be trained to render sharp ground truth depth maps by comparing the output with sharp ground truth depth maps. As all modules are differentiable, this enables end-to-end training of the complete system. In a preferred implementation, a NeRF pipeline is trained end-to-end with the PSF lens model.
The learnable PSF lens model 711 is trained to learn PSF kernel weights Kt] 712 for each location on an image sensor of the camera (i.e. for each pixel, and is optionally also applied across one or more neighbouring pixels in a patch JV(x,y)), given depth at each location, camera lens focal length, focus distance and lens aperture for the lens, as shown at 718.
After pre-training with a per-lens paired dataset, as described above, the PSF model 711 for a lens can be fixed and used together with any neural rendering method for end-to-end reconstruction and novel-view rendering of new scenes.
The coordinate transform module shown at 710 converts image sensor pixel locations (x,y) to another coordinate system (e.g. polar coordinates) and can apply sensor priors to aid with training of the Al system. For example, the coordinate transform module may apply prior knowledge that the sensor has symmetric properties, thus reducing the learnable space to one half or one quarter of the complete space accordingly.
The pipeline also comprises a PSF depth modulation mechanism, which determines a function 713 that modulates PSF kernels Kt] 712 with continuous depth mask weights
Figure imgf000012_0001
714 based on pixel-wise depth differences from a central pixel in a patch JV(x,y). Therefore, the continuous depth mask weights are a function that modulates PSF kernels based on pixelwise relative depth differences. This is used to prevent blurring across parts of the image at different depths.
By applying the depth mask weights
Figure imgf000012_0002
to the kernel weights Kt], this modifies each learned point spread function in dependence on the respective depth value of the respective pixel to form a modified point spread function for each pixel. This results in a spatial and depth varying kernel at 715. The PSF application module shown at 716 is a mechanism of applying the transformations of spatial and depth-varying point spread functions to rendered images. This module applies the learnt PSF kernel to the RGB colour values C(x,y) of a pixel or a patch of pixels centred on a particular sensor location (x,y), modulated by the continuous depth mask, to generate images 717 rendered with depth-of-field blur.
A more detailed embodiment of the pipeline is schematically illustrated in Figure 7b.
The pipeline 800 can perform one or more operations on each of the input images 801 including ray casting 802, point sampling 803 and frequency encoding 804 before inputting data derived from the input images to the differential rendering model 805, which in this example is a NeRF model.
Given a set of 2D images of a 3D scene with known camera poses (for example, specifying focal length, rotation and translation), the NeRF model 805 can learn an implicit mapping from spatial coordinates (x, y, z) to volume density and view-dependent colour (RGB). The output of the NeRF model 805 is (Cj, o-j), shown at 806. The volume rendering module 807 accumulates samples of each scene representation along camera rays to render the scene from any viewpoint. The output of the volume rendering module 807 is colour, C(x,y), and depth, Depth(x,y), values for each pixel of the input image at a pixel location (x,y) of the image sensor, as shown at 808. The patch sampler 809 operates as for the patch sampler 709 of Figure 7a.
The coordinate transform module 810 converts image sensor pixel locations (x, y) from Cartesian coordinates to Polar coordinates and can apply sensor priors, as described above. In this example, the output of the coordinate transform module is frequency encoded at 811 .
The PSF model 812 in this example is a multi-layer perceptron (MLP) model. The learnable PSF lens model 812 is trained to learn PSF kernel weights Kt] 813 for each location on an image sensor of the camera (i.e. for each pixel and optionally one or more neighbouring pixels in a patch J\T(x,y)), given depth at each location, camera lens focal length, focus distance and lens aperture for the lens, as shown at 819. In this example, these parameters are frequency modulated at 820 before being input to the model 812.
The PSF kernels Kt] 813 output from the PSF model 812 are modulated with continuous depth mask weights 815 based on pixel-wise depth differences from a central pixel in a patch N(x,y) from a continuous depth mask, shown at 814, which in this example uses a Gaussian function.
By applying the depth mask weights
Figure imgf000014_0001
to the kernel weights Kt], this modifies each learned point spread function in dependence on the respective depth value of the respective pixel to form a modified point spread function for each pixel. This results in a spatial and depth varying kernel at 816.
The transformations of spatial and depth-varying point spread functions are then applied to rendered images. In this example, this is performed using a convolution (dot product), as will be described in more detail with reference to Figure 12. The module 817 applies the learnt PSF kernel to the RGB colour values C(x,y) of a pixel or a patch of pixels centred on a particular sensor location (x,y), modulated by the continuous depth mask, to generate images 818 rendered with depth-of-field blur.
As described above, the learnable PSF model for a particular camera lens learns to generate PSFs for each pixel. The PSFs may also be termed blur kernels. The PSF for a respective pixel may be applied to the respective pixel and one or more neighbouring pixels of a patch comprising the respective pixel. For example, the PSF for a respective pixel may be applied to a patch of 9 pixels, 16 pixels, 25 pixels, or so on (with the respective pixel at the center of the patch). This may be performed for all pixels of the image.
One implementation of the network design for a PSF neural field model is shown in Figure 8. In this example, the PSF model is a neural field represented by a 3-layer fully connected multilayer perceptron neural network (MLP) 807 to represent the neural field. From an RGB image 801 and a corresponding depth map 802, comprising RGB colour and depth data for multiple pixels respectively, the model takes as input a sensor location (x,y) and scene depth d(x,y) at that location and outputs the corresponding blur kernel weights ktj. Parameters 806 of a camera 805 (such as focal length, focus distance and aperture) can also be fed into the PSF model 807. In this implementation, all inputs to the MLP 807 are encoded to a higher dimensionality using a coordinate transform 803 (for example, from Cartesian to Polar coordinates) and frequency encoding 804, following the standard neural rendering procedure. This frequency encoding can increase the input dimensionality. Alternative neural network architectures may be used. The output of the MLP model 807 is the PSF blur kernel weights. In this example, the output is a kernel 808 in vector form, which is reshaped into an s x s matrix 809 containing s2 elements. 810 is an example of how such a point spread function may look for real data. The model 807 is fully differentiable so can be trained either separately (pretrained), or end-to-end within a neural rendering framework.
The transformation of the coordinates (x,y) of the image sensor 900 from Cartesian coordinates to Polar coordinates (r, 0) before being input to the MLP is schematically illustrated in Figure 9. This transformation constrains the learned PSF kernel weights to a single quadrant 901 of the image sensor 900, exploiting the symmetric properties of the image sensor. Therefore, symmetry of the learned PSFs is enforced by restricting inputs to a single quadrant 901 of the sensor 900. This can reduce the learnable space to one quarter of the total sensor space.
As described above, the PSF modulation mechanism implements a function that generates a continuous weight mask based on the respective depths of each pixel of a patch of pixels to modulate the learned PSF kernel. This mechanism can therefore output a soft weight mask for a patch of pixels to prevent blurring of the PSF kernels across parts of the scene at different depths, instead of splitting the depth into discontinuous planes.
One example of such a function is shown in Figure 10. In this implementation, a Gaussian function is used to produce weights Wy for a patch of pixels 1000 based on the depth difference of a respective pixel 1001 of the patch 1000 from the centre pixel 1002 of the patch of pixels (i.e. di'j - dtj, where d is depth). The weight mask is used to prevent blurring across parts of scene at different depths by applying the weights to the kernel values of a patch of pixels. Using a simple Gaussian function, the larger the depth discontinuity from the center pixel of a given patch, the less weight is applied. The hyperparameter of the Gaussian’s standard deviation a can be pre-defined and fixed, or could be a learnt parameter. Alternatively, the function could be represented by a neural network.
As illustrated in Figure 11 , this mechanism is equivalent to splitting the scene into depth planes, but instead of discretized planes, the soft weight depth mask transforms this into a continuous weighting function for each patch 1-4 shown in the figure, which is better for optimization purposes and model expressivity.
As schematically illustrated in Figure 12, the PSF application module applies the corresponding PSF at each sensor location (i.e. for each respective pixel and one or more neighbouring pixels in a patch comprising a respective pixel at the center of the patch) to produce the final RGB output image C with rendered depth-of-field blur. This module generates the final image pixel values with rendered depth-of-field blur. It fuses a patch of rendered colour pixels, with each pixel having a respective colour value c^, the learned PSF kernel
Figure imgf000016_0001
at that sensor location, and the modulating weight mask produced by the PSF modulation mechanism. In this exemplary implementation, a simple dot product of the colour values with kernel weights and PSF modulation weights, summed over a patch, is performed. Other implementations may alternatively be used, for example using a dictionary of learned kernels and applying them in the fast Fourier Transform (FFT) domain.
To train the pixel-wise reconstruction, the loss (L1 loss) between the ground truth image I and the predicted image / can be minimized according to the following loss function:
1 = II' -'ll.
Figure 13 shows the steps of an exemplary method for forming an enhanced image. At step 1301 , the method comprises receiving data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image. In some implementations, the data for the input image may comprise respective colour and depth values for every pixel of the input image. At step 1302, the method comprises receiving a trained point spread function model comprising a respective learned point spread function for each pixel of the multiple pixels of the input image. At step 1303, the method comprises modifying each learned point spread function in dependence on the respective depth value of the respective pixel to form a respective modified point spread function for each pixel of the multiple pixels of the input image. At step 1304, the method comprises applying the respective modified point spread function to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image. The respective modified point spread function may be applied to a patch of pixels centred on the respective pixel. This may be performed for each pixel of the multiple pixels of the input image. This may be performed for every pixel of the input image to form the enhanced image.
This method may be implemented on-device (for example, on a smartphone) or externally, such as on cloud-based services.
Figure 14 shows an example of an image processing apparatus for performing the methods described herein. The apparatus 1400 may comprise at least one processor, such as processors 1401 , and at least one memory, such as memory 1402. The memory stores in a non-transient way code that is executable by the processor(s) to implement the apparatus in the manner described herein. The apparatus may also comprise one of more image sensors 1403 configured to capture an image which can then be input to a rendering pipeline implemented by the processor 1401 or used to train the rendering pipeline in the manner described herein.
This Al system can enable depth-of-field blur to be rendered in images rendered by a novel view synthesis method such as NeRF. The PSF lens blur model is represented by a neural field, which can be conditioned on sensor location, depth and captured camera parameters (such as focal length, focus distance and aperture). The depth weighting mechanism modulates the learned blur kernels to prevent blurring across parts of the scene at significantly different depths. The learnt lens blur kernels and depth weighting are then applied to image patches rendered by a neural rendering framework to generate images with depth-of-field blur.
The implementation of the learned PSF model in the rendering pipeline can lead to a more expressive camera model, capable of generating images with realistic depth-of-field blur, rather than the sharp all-in-focus images normally generated by standard NeRF models. The PSF model can also lead to better geometry (depth) reconstruction when the learnt lensspecific PSF model is incorporated into a NeRF framework and trained end-to-end. This is because the NeRF has to update the depth map to produce sharp colours before the blur model is applied. Furthermore, by incorporating a learnable lens model into a neural rendering framework, lens-specific (scene-agnostic) blur can be learned from real images. Thus, the learned PSFs are generalizable across scenes, and novel views can be rendered and the blur controlled a priori.
This approach therefore provides for a controllable and learnable system that generates DoF blur by taking colour and depth values from volumetric rendering and applies point spread functions. The neural field can learn spatially-varying kernel weights Kt] based on real camera parameters (focal length, focus distance, aperture value), transformed sensor coordinates, and depth values.
The present approach can enable learning of spatial- and depth-varying PSF kernels for each location on the sensor, leading to the realistic reconstruction of images containing DoF blur. Transforming sensor coordinates (e.g. polar coordinates in a quotient space) enables priors such as PSF symmetry. Conditioning the neural field on camera parameters enables view synthesis with novel camera parameters.
The continuous depth mask provides for a smooth function modulates the PSF kernels based on pixel-wise depth differences, preventing blurring across large depth discontinuities and avoiding problems associated with discretized depth planes. This can enable the application of learnt PSF kernels to colours and depth provided by neural rendering, resulting in renders with realistic DoF blur.
Figures 15a-15c show qualitative results of fitting a lens PSF model to a synthetic 3D scene. The present model is able to render the out-of-focus blur accurately, with a PSNR of ~50dB when compared to the ground truth blurry image. Figure 15a shows an all-in-focus input image, Figure 15b shows a ground truth blurry images and Figure 15c shows a predicted blurry image obtained using the present approach, with an example of a PSF blur kernel learned through optimization of the model.
Incorporating the learned PSF lens model (pre-trained on a dataset sharp-blurry image pairs) into an end-to-end NeRF framework improves the depth geometry reconstruction and enables sharp images to be recovered from blurry input images. This is because the PSF model enables the NeRF to learn an internal scene representation that is sharp, before volume rendered pixels are convolved with their corresponding blur kernels. This result is shown in Figures 16a-16c. Figure 16a shows a ground truth sharp image, Figure 16b shows a blurry input image and Figure 16c shows a recovered sharp image. Incorporating the learned lens PSF model into the NeRF framework and training end-to-end results in sharper depth reconstruction and enables the sharp image to be recovered from blurry inputs.
Figures 17a-17e shows some further qualitative results on real world data. Figure 17a shows the all-in-focus input, Figure 17b shows the ground truth blur, and Figure 17c shows the result of fitting the PSF blur model to a real 3D scene captured with the all-in-focus and blurry image pair of Figures 17a and 17b. Figures 17d and 17e show novel views which can be rendered with the learned PSF blur, and this blur can be modified by controlling the blur kernels.
Some further advantages of this solution are that the lens model is conditioned on real camera parameters and thus enables rendering of images with novel camera parameters. The ambiguity between aperture and focus distance is mitigated since the lens PSF model is pretrained with labelled paired data, i.e. the lens model is conditioned on camera parameters. As discussed above, the PSF lens model can be pre-trained, for example in the lab, using paired sharp and blurry images with known camera parameters as input. This enables the lens model to be generalizable across scenes, as it is specific to the lens and is scene-agnostic. The trained point spread function model can advantageously be learned based on camera parameters of an image sensor. The model is therefore also controllable and conditioned on lens parameters, enabling novel views to be rendered with novel camera parameters, facilitating the depth-of-field to be changed after capture and NeRF reconstruction (whereas in most prior NeRF methods, the depth-of-field is baked into the NeRF model and cannot be changed). As the PSF model can advantageously be conditioned on known camera settings (such as focus distance and aperture from camera’s EXIF data) which are fed into the model, this enables easy control of the blur by changing in the input camera parameters. Therefore, the method can learn to disambiguate between focus distance and aperture value and enables novel view synthesis with new camera parameters (such as aperture and focus distance).
In contrast to previous methods, dense PSF kernels are learned, where the PSF function is applied to a central pixel and multiple neighbouring pixels (i.e. a patch of pixels). The blur rendering is lens-specific (i.e. scene-agnostic blur, where the blur rending does not depend on the scene being captured). Embodiments of the present invention can therefore provide a general lens model that is scene-agnostic (and camera and lens dependent instead) and thus generalizable across different scenes.
The PSF model described herein can therefore learn arbitrary blur PSF kernels from real data and is fully spatial- and depth-varying, resulting in a more expressive and realistic model compared to the pinhole camera model used in standard NeRF frameworks.
Thus, by incorporating a PSF model for a camera lens into a neural rendering framework, the behaviour of real lenses can be better modelled, and images can be reconstructed with realistic depth-of-field blur. When trained end-to-end within a NeRF framework, a sharp all-in-focus internal representation of the 3D scene is learnt, which when rendered using standard volume rendering techniques, enables controllable depth-of-field blur given novel camera parameters.
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1. An image processing apparatus (1400) for forming an enhanced image (717, 818), the image processing apparatus (1400) being configured to: receive (1301) data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image; receive (1302) a trained point spread function model (711 , 812) comprising a respective learned point spread function (712, 813) for each pixel of the multiple pixels of the input image; modify (1303) each learned point spread function (712, 813) in dependence on the respective depth value of the respective pixel to form a respective modified point spread function (715, 816) for each pixel of the multiple pixels of the input image; and apply (1304) the respective modified point spread function (715, 816) to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image (717, 818).
2. The image processing apparatus (1400) as claimed in claim 1 , wherein the apparatus is configured to form the enhanced image so as to render depth-of-field blur in the enhanced image (717, 818).
3. The image processing apparatus (1400) as claimed in any preceding claim, wherein the apparatus is configured to modify each learned point spread function (712, 813) in dependence on a respective depth mask weight (714, 815) for the respective pixel, each depth mask weight (714, 815) being determined based on a pixel-wise depth difference from a central pixel in the multiple pixels of the input image.
4. The image processing apparatus (1400) as claimed in any preceding claim, wherein each learned point spread function (712, 813) comprises a matrix of weights (813) corresponding to the respective pixel and multiple neighboring pixels.
5. The image processing apparatus (1400) as claimed in any preceding claim, wherein each learned point spread function (712, 813) is further modifiable in dependence on one or more parameters (718, 819) of an image sensor and camera lens that captured the input image.
6. The image processing apparatus (1400) as claimed in claim 5, wherein the one or more parameters (718, 819) comprise one or more of focal length, focus distance and aperture value.
7. The image processing apparatus (1400) as claimed in any preceding claim, wherein the learned point spread function model (711 , 812) is specific to a particular lens of an image sensor used to capture the input image and wherein the learned point spread functions (712, 813) vary spatially across the image sensor.
8. The image processing apparatus (1400) as claimed in any preceding claim, wherein the apparatus is further configured to convert spatial locations in the input image to another coordinate system and apply one or more known properties of an image sensor used to capture the input image to the converted spatial locations.
9. The image processing apparatus (1400) as claimed in any preceding claim, wherein the trained point spread function model (711 , 812) is scene-agnostic.
10. The image processing apparatus (1400) as claimed in any preceding claim, wherein the received data is an output of an image rendering model (705, 805).
11. The image processing apparatus (1400) as claimed in claim 10, wherein the input image corresponds to a novel view output from the image rendering model (805).
12. The image processing apparatus (1400) as claimed in any preceding claim, wherein the trained point spread function model (812) is a multi-layer perceptron neural network.
13. The image processing apparatus (1400) as claimed in any preceding claim, wherein the trained spread point function model (711 , 812) is trained end-to-end with the image rendering model (705, 805).
14. The image processing apparatus (1400) as claimed in any preceding claim, wherein the trained point spread model (711 , 812) is trained using paired sharp-blurry images with labelled lens parameters.
15. A method (1300) for forming an enhanced image (717, 818), the method comprising: receiving (1301) data for an input image, the data comprising respective colour and depth values for each pixel of multiple pixels of the input image; receiving (1302) a trained point spread function model (711 , 812) comprising a respective learned point spread function (712, 813) for each pixel of the multiple pixels of the input image; modifying (1303) each learned point spread function (712, 813) in dependence on the respective depth value of the respective pixel to form a respective modified point spread function (715, 816) for each pixel of the multiple pixels of the input image; and applying (1304) the respective modified point spread function (715, 816) to the respective colour value for one or more pixels comprising the respective pixel to form the enhanced image (717, 818).
PCT/EP2022/081411 2022-11-10 2022-11-10 Learnable point spread functions for image rendering WO2024099564A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/081411 WO2024099564A1 (en) 2022-11-10 2022-11-10 Learnable point spread functions for image rendering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/081411 WO2024099564A1 (en) 2022-11-10 2022-11-10 Learnable point spread functions for image rendering

Publications (1)

Publication Number Publication Date
WO2024099564A1 true WO2024099564A1 (en) 2024-05-16

Family

ID=84365456

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/081411 WO2024099564A1 (en) 2022-11-10 2022-11-10 Learnable point spread functions for image rendering

Country Status (1)

Country Link
WO (1) WO2024099564A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8391627B1 (en) * 2004-01-28 2013-03-05 Adobe Systems Incorporated Using forward and backward kernels to filter images
US20220156887A1 (en) * 2020-11-17 2022-05-19 Adobe Inc. Kernel reshaping-powered splatting-based efficient image space lens blur

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8391627B1 (en) * 2004-01-28 2013-03-05 Adobe Systems Incorporated Using forward and backward kernels to filter images
US20220156887A1 (en) * 2020-11-17 2022-05-19 Adobe Inc. Kernel reshaping-powered splatting-based efficient image space lens blur

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
B. MILDENHALL ET AL.: "NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images", COMPUTER VISION AND PATTERN RECOGNITION CONFERENCE, 2022
B. MILDENHALL ET AL.: "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis", THE 2020 EUROPEAN CONFERENCE ON COMPUTER VISION, 2020
BAUER MATTHIAS ET AL: "Automatic estimation of modulation transfer functions", 2018 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL PHOTOGRAPHY (ICCP), IEEE, 4 May 2018 (2018-05-04), pages 1 - 12, XP033352260, DOI: 10.1109/ICCPHOT.2018.8368467 *
BUSAM ET AL.: "Sterefo: Efficient Image Refocusing with Stereo Vision", INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, 2019
LI ET AL.: "Deblur-NeRF: Neural Radiance Fields from Blurry Images", COMPUTER VISION AND PATTERN RECOGNITION CONFERENCE, 2022
M. BAUER ET AL.: "Automatic Estimation of Modular Transfer Functions", IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL PHOTOGRAPHY, 2018, pages 1 - 12
THOMAS HACH ET AL: "Cinematic Bokeh rendering for real scenes", PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION, CVMP '15, 1 January 2015 (2015-01-01), New York, New York, USA, pages 1 - 10, XP055632848, ISBN: 978-1-4503-3560-7, DOI: 10.1145/2824840.2824842 *
WANG ET AL.: "NeRFocus: Neural Radiance Field for 3D Synthetic Defocus", ARXIV:2203.05189, 2022
WU ET AL.: "DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields", ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2022

Similar Documents

Publication Publication Date Title
Abuolaim et al. Defocus deblurring using dual-pixel data
US10666873B2 (en) Exposure-related intensity transformation
Liao et al. DR-GAN: Automatic radial distortion rectification using conditional GAN in real-time
Wang et al. Real-time image enhancer via learnable spatial-aware 3d lookup tables
KR20210139450A (en) Image display method and device
TW202134997A (en) Method for denoising image, method for augmenting image dataset and user equipment
US10970821B2 (en) Image blurring methods and apparatuses, storage media, and electronic devices
KR20210089166A (en) Bright Spot Removal Using Neural Networks
CN113129236B (en) Single low-light image enhancement method and system based on Retinex and convolutional neural network
KR102095443B1 (en) Method and Apparatus for Enhancing Image using Structural Tensor Based on Deep Learning
EP3704508A1 (en) Aperture supervision for single-view depth prediction
CN104424640A (en) Method and device for carrying out blurring processing on images
JP2021056678A (en) Image processing method, program, image processing device, method for producing learned model, and image processing system
US20220270215A1 (en) Method for applying bokeh effect to video image and recording medium
CN113039576A (en) Image enhancement system and method
Luo et al. Bokeh rendering from defocus estimation
Luo et al. Wavelet synthesis net for disparity estimation to synthesize dslr calibre bokeh effect on smartphones
KR20230074136A (en) Salience-based capture or image processing
CN116157805A (en) Camera image or video processing pipeline using neural embedding
Zhao et al. D2hnet: Joint denoising and deblurring with hierarchical network for robust night image restoration
CN116612263B (en) Method and device for sensing consistency dynamic fitting of latent vision synthesis
WO2024099564A1 (en) Learnable point spread functions for image rendering
US20220309696A1 (en) Methods and Apparatuses of Depth Estimation from Focus Information
Anantrasirichai et al. Fast depth estimation for view synthesis
CN112203023B (en) Billion pixel video generation method and device, equipment and medium