WO2022175688A1 - Image processing - Google Patents

Image processing Download PDF

Info

Publication number
WO2022175688A1
WO2022175688A1 PCT/GB2022/050470 GB2022050470W WO2022175688A1 WO 2022175688 A1 WO2022175688 A1 WO 2022175688A1 GB 2022050470 W GB2022050470 W GB 2022050470W WO 2022175688 A1 WO2022175688 A1 WO 2022175688A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
polarization
map
polarized
reflectance
Prior art date
Application number
PCT/GB2022/050470
Other languages
French (fr)
Inventor
Abhijeet Ghosh
Valentin Deschaintre
Yiming Lin
Original Assignee
Imperial College Innovations Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB2102482.3A external-priority patent/GB2603951A/en
Application filed by Imperial College Innovations Limited filed Critical Imperial College Innovations Limited
Publication of WO2022175688A1 publication Critical patent/WO2022175688A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present invention relates to image processing, in particular to estimating three- dimensional shape and spatially- varying reflectance of an object from a set of images of the object.
  • Li etal. “Learning to reconstruct shape and spatially- varying reflectance from a single image ", ACM Transactions on Graphics, volume 37, no. 6, Article 269 (2016) (herein referred to as “Li et al. ”) describes recovering spatially- varying bidirectional reflectance distribution function (SVBRDFs) and complex geometry from a single RGB image captured under a combination of unknown environment illumination and flash lighting by training a deep neural network to regress shape and reflectance from the image.
  • SVBRDFs spatially- varying bidirectional reflectance distribution function
  • V. Deschaintre etal. Single-Image SVBRDF Capture with a Rendering -Aware Deep Network
  • ACM Transactions on Graphics, volume 37, no. 4, Article 128 (2018) (herein referred to as "Deschaintre etal.") describes using a neural network to reconstruct complex SVBRDFs of planar samples given a single input photograph under flash illumination, based on training using only synthetic data.
  • Kadambi et al. “Polarized 3D: High-quality depth sensing with polarization cues", Proceedings of the IEEE International Conference on Computer Vision, pages 3370- 3378 (2015) (herein referred to as “Kadambi et al.”) describes using polarization enhance depth maps obtained using a Microsoft (RTM) Kinect depth sensor.
  • Y. Ba et al. “Deep shape from polarization”, European Conference on Computer Vision (ECCV), 2020 (herein referred to as "Ba etal.”) describes a deep learning-based approach to inferring the shape of a surface under uncontrolled environment illumination using polarization imaging. Both Kadambi et al. and Ba et al. only estimate shape.
  • Boss et al. “Two-shot spatially- varying brdf and shape estimation", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020 (herein referred to as “Boss et al.") describes a cascaded network and guided prediction networks for SVBRDF and shape estimation from two-shot images, under flash and ambient environmental illumination respectively.
  • a method comprising receiving a set of at least three images of an object including at least two linearly-polarized images (for example, at least two linearly-polarized color images) and at least one color image (which may or may not be linearly-polarized), wherein the three images have the same view of the object and are acquired under the same illumination condition (in other words, for each of the at least three images, the object is illuminated in the same way, e.g., from the same, single fixed point, from the same, multiple fixed points, or from the same fixed range or extent of illumination) in which either diffuse polarization or specular polarization dominates in surface reflectance, and wherein a set of Stokes parameters s 0 , s 1 and s 2 is determinable from the at least three images.
  • the three images have the same view of the object and are acquired under the same illumination condition (in other words, for each of the at least three images, the object is illuminated in the same way, e.g., from the same, single fixed point, from the same,
  • the method further comprises generating three-dimensional shape and spatially- varying reflectance of the object from the set of at least three images using a deep neural network trained with a plurality of sets of training images, each of the plurality of sets of training images including at least three training images including at least two linearly-polarized training images and at least one color image from which a respective set of Stokes parameters s 0 , s 1 and s 2 is determinable and storing said three- dimensional shape and spatially-varying reflectance generated by the deep neural network.
  • the three-dimensional shape and spatially- varying reflectance can be used to render a high-quality image of the object under new lighting conditions.
  • the images are preferably acquired under controlled illumination, for example, indoors or, if outdoors, under cloudy conditions or other suitably shaded conditions providing uniform illumination, whereby polarized illumination is minimised or minimal such that it is not dominant.
  • the illumination may be flash illumination such that diffuse polarization dominates and, thus, the Stokes map may be based on diffuse polarization.
  • the illumination may be unpolarized.
  • the flash illumination may, however, be linearly polarized or circularly polarized.
  • the flash illumination may include a mixture of polarized light (linearly- and/ or circularly-polarized light) and/ or unpolarized light.
  • the illumination may be uniform and surround the object ⁇ e.g., spherical or hemispherical illumination) such that specular polarization dominates.
  • the uniform illumination maybe unpolarized or circularly-polarized for non-planar 3D objects.
  • the uniform illumination may include a mixture of circularly-polarized light and unpolarized light for non-planar 3D objects.
  • the uniform illumination may be from an extended or a sufficiently large area light source or light panel or display panel, or even locally uniform environmental illumination.
  • the uniform illumination from an extended area-light may be unpolarized, linearly-polarized or circularly-polarized.
  • the set of at least three images may comprise at least three color images.
  • the at least two linearly-polarized images and at least one color image may comprise at least two linearly-polarized colour images.
  • a set of Stokes parameters s 0 , s 1 and s 2 is determinable from the at least three images, for example, if the at least two linearly-polarized images include first and second linearly-polarized images in which the angle of polarization between the first and second images are separated by 45o.
  • a set of Stokes parameters s 0 , s 1 and s 2 is determinable from the at least three images, for example, if the at least two linearly- polarized images include first, second and third linearly-polarized images in which the angle of polarization are o°, 45o and 90° respectively.
  • the set of linear Stokes parameters may be determined by a different combination of angles of polarization, such as, for example, o°, 6o° and 120° respectively.
  • At least the unpolarised Stokes parameter s 0 has color.
  • the horizontally polarized reflectance Stokes parameter s 1 and/or the polarization reflectance Stokes parameter s 1 may have color.
  • the object maybe a three-dimensional object, that is, an object which is not substantially flat or planar, and/ or includes one or more convex surface(s).
  • the three- dimensional object may include whole or part of a human subject (e.g., face or full- body), an animal or a plant.
  • the object may be a planar object.
  • the method may further comprise receiving a polarization shape map generated from the Stokes parameters s 1 and s 2 for the object and/or a colour map and/or a degree of polarization (DOP) map, or generating a polarization shape map from the Stokes parameters s 1 and s 2 for the object and/or a colour map and/or a DOP map using the set of at least three images.
  • the three-dimensional shape and spatially-varying reflectance may be generated from the set of at least three images and the polarization shape map and/or the colour map and/or the DOP map.
  • the color map maybe a diffuse color map.
  • the polarization shape map maybe a normalised Stokes map or an angle of polarization map.
  • the degree of polarization (DOP) map maybe a diffsue DOP map or a specular DOP map.
  • the plurality of sets of training images may comprise a plurality of sets of synthesized training images.
  • the plurality of sets of synthesized training images may be generated using a plurality of meshes of objects and a plurality of different spatially- varying bidirectional reflectance distribution function (SVBRDs) corresponding to different materials.
  • Generation of a training image may include selecting a mesh and a material and randomly rotating the mesh and material.
  • the plurality of sets of synthesized training images may comprise a plurality of images modelling either diffuse or specular polarization in surface reflectance.
  • the plurality of sets of training images may comprise a plurality of sets of measured training images.
  • the plurality of sets of measured training images may comprise a plurality of images exhibiting either diffuse or specular polarization in surface reflectance.
  • the at least three images of the object may comprise three or four linearly-polarized images, for example, three or four linearly-polarized color images.
  • the three-dimensional shape may comprise a surface normal map and a depth map.
  • the spatially-varying reflectance may comprise a diffuse albedo map, and a specular albedo map, and/or a specular roughness map.
  • the deep neural network may comprise a convolutional neural network having an encoder and a decoder and skip connections between the encoder and decoder.
  • the decoder may be a branched decoder comprising at least two branches.
  • the skip connections may include at least one residual block or a series of at least two residual blocks.
  • the deep neural network trained by considering rendering losses for each linearly-polarized image.
  • the deep network may include a parallel arrangement of a U- Net image-to-image network and a global features network.
  • the set of at least three images may be acquired using frontal flash illumination (which maybe unpolarized, or linearly or circularly polarized) incident on the object so as to cause diffuse polarization to dominate in the surface reflectance.
  • the frontal illumination can be from a flash or a projector.
  • the set of at least three images may be acquired using uniform illumination (which may be unpolarized or circularly polarized) disposed around and directed at the object so as to cause specular polarization to dominate in the surface reflectance.
  • the uniform illumination may comprise a plurality of light sources arranged in a hemisphere or sphere around the object, or surrounding the object, to provide uniform illumination on the object.
  • uniform illumination can be achieved using an extended or a sufficiently large area-light source or light panel or display panel, or locally- uniform environmental illumination incident on the object at near normal incidence or obliquely incident at near Brewster angle of incidence.
  • the uniform illumination may be unpolarized, linearly polarized or circularly polarized.
  • a method comprising receiving a set of linearly-polarized color images of an object, each linearly- polarized image having a different angle of polarization, the linearly-polarized color images having the same view of the object and acquired using unpolarized, frontal, flash illumination of the object.
  • the method may optionally include receiving a reflectance map and a shape map for the object generated from the set of linearly- polarized images.
  • the method comprises generating three-dimensional shape and spatially- varying reflectance of the object from the set of linearly-polarized images, and optionally the reflectance map and the shape map, using a deep neural network trained with a synthetic or measured dataset, wherein the synthetic or measured dataset includes a plurality of sets of data, each set of data including a set of linearly-polarized images having different polarizations, and optionally a reflectance map and a shape map generated from the linearly-polarized images, and ground truth three-dimensional shape and spatially-varying reflectance and storing said three-dimensional shape and spatially-varying reflectance generated by the deep neural network
  • a computer program comprising instructions for performing the method of the first or second aspect.
  • a computer program product comprising a computer readable medium (which maybe non-transitory) storing the computer program of the third aspect.
  • a device comprising at least one processor and storage.
  • the at least one processor is configured, in response to receiving a set of at least three images of an object including at least two linearly-polarized images and at least one color image, wherein the three images have the same view of the object and are acquired under the same illumination condition in which either diffuse polarization or specular polarization dominates, wherein a set of Stokes parameters s 0 , s 1 and s 2 is determinable from the at least three images, to generate three-dimensional shape and spatially- varying reflectance of the object from the set of at least three images using a deep neural network trained with a plurality of sets of training images, each of the plurality of sets of training images including at least three training images including at least two linearly-polarized training images and at least one color image from which a respective set of Stokes parameters s 0 , s 1 and s 2 is determinable and to store said three-dimensional shape and spatially-varying reflectance generated by the deep neural network in the storage.
  • the at least one processor may receive a polarization shape map generated from the Stokes parameters s 1 and s 2 for the object and/or a colour map and/or a degree of polarization (DOP) map.
  • the at least one processor may further be configured to generate a polarization shape map from the Stokes parameters s 1 and s 2 for the object and/or a colour map and/or a DOP map using the set of at least three color images.
  • the at least one processor may be configured to generate three-dimensional shape and spatially-varying reflectance from the set of at least three color images and the polarization shape map and/or the colour map and/or the DOP map.
  • the device may further comprise a color digital camera and a linear polarizing filter for acquiring the at least three color images.
  • the device may further comprise or be provided with a flash or a projector for providing directional illumination on the object, preferably from a frontal direction.
  • the device may further comprise or be provided with a one or more light sources (for example, light emitting diodes, light panels or display panels) and, optionally, one or more reflecting surfaces arranged around the object to provide uniform illumination on the object. Light from the one or more light sources maybe bounced from the one or more reflecting surfaces(s).
  • a one or more light sources for example, light emitting diodes, light panels or display panels
  • one or more reflecting surfaces arranged around the object to provide uniform illumination on the object. Light from the one or more light sources maybe bounced from the one or more reflecting surfaces(s).
  • the one or more light sources may comprise a plurality of light sources arranged in a hemisphere or sphere around the object.
  • the one or more reflecting surfaces may comprise plurality of reflecting surfaces arranged in a hemisphere or sphere around the object.
  • the reflecting surface(s) maybe concave.
  • the reflecting surface(s) may provide diffuse reflection.
  • the at least one processor may include one or more central processing units (CPUs).
  • the at least one processor may include one or more graphical processing units (GPUs).
  • a method of training a deep neural network comprises providing a plurality of sets of training images and corresponding ground truth three-dimensional shape and spatially- varying reflectance of objects to a deep neural network, each set of training images including at least three training images including at least two linearly-polarized training images (for example, at least two linearly-polarized color images) and at least one color image (which may or may not be linearly-polarized) from which a respective set of Stokes parameters s 0 , s 1 and s 2 is determinable; and storing the trained deep neural network.
  • the method may further comprise providing a polarization shape map generated from the Stokes parameters s 1 and s 2 and/or a colour map and/or a DOP map.
  • the set of training images may comprise a plurality of sets of synthesized training images and/or measured training images.
  • a seventh aspect of the present invention is provided a computer program comprising instructions for performing the method of the sixth aspect.
  • a computer program product comprising a computer readable medium (which maybe non-transitory) storing the computer program of the seventh aspect.
  • apparatus for comprising at least one processor and storage for training a deep neural network.
  • Figure 1 is a block diagram of a system for estimating object shape and spatially- varying 5 bidirectional reflectance distribution function (SVBDRF) from polarization cues, the system including a trained deep network;
  • SVBDRF bidirectional reflectance distribution function
  • Figure 2A illustrates a system for capturing polarized images
  • Figure 2B illustrates captured polarized images (or “captured inputs"
  • Figure 2C illustrates computed explicit cues (or "captured inputs")
  • Figure 2D illustrates synthetic training data used to train a deep network shown
  • Figure 3 illustrates shape and SVBRDF estimated for an object in the form of normal, diffuse, specular, roughness and depth maps
  • Figure 4 is a process flow diagram of a method of estimating object shape and SVBDRF
  • Figure 5 is a process flow diagram of a method of training a deep network
  • Figure 6A is an ideal normalized Stokes map for a sphere under frontal flash illumination
  • Figure 6B is a practical signal captured with a measured Stokes map of a rubber ball with embossed text under flash illumination
  • Figures 7A and &B illustrate specular reflection on planar surfaces, namely a brick wall 0 and a color chart, due to unpolarized sky acquired at oblique angle of incidence using a linear polarizer in front of a camera at horizontal o° orientation ("Max”) and at vertical 90° orientation ("Min”) respectively;
  • Figure 7C is degree of polarization computed from Max and Min and which appears to contain cues about the surface specular roughness
  • Figure 8 illustrates a deep network architecture which has a general U-Net and in which decoders are divided into three different branches, each handling a related set of output map(s), namely normal and depth, diffuse albedo, roughness and specular albedo, and in which res-blocks are introduced on the skip connections between the encoder and the different branches of the decoder allowing the network to adapt the 0 information forwarded to the different branches of the decoder;
  • Figure 9 illustrates comparisons of results on synthetic data produced by the method herein described and those produced by methods described in Li et al. and Boss et al.
  • Figure 10 illustrates comparisons of results on real objects produced by the method herein described and those produced by the method described in Li et al.
  • Figure 11A illustrates a mixed Stokes map of a ball under complex lighting
  • Figure 11B illustrate an example of an exploitable pure Stokes map of a lemon
  • Figures 12A and 12B illustrates plots of angle of incidence (x-axis) versus measured diffuse degree of polarization (DOP) (y-axis) for two spherical balls;
  • DOP diffuse degree of polarization
  • Figure 12C illustrates a plot of angle of incidence (x-axis) versus simulated DOP using a polynomial fit to measured data
  • Figures 13A, 13B, 13C and 13D illustrate simulated linearly-polarized images at o°, 45o, 90°, 135o respectively of a sphere having a surface comprised of tiled, green stone material acquired using frontal flash illumination;
  • Figure 14 is a simulated normalized color map of the sphere shown in Figures 13A to 13D; and Figure 15 is a simulated Stokes map obtained from the linearly-polarized images shown in Figures 13A to 13D illustrating dominance of diffuse polarization which is independent of the polarization state of flash illumination.
  • HDR high-dynamic range
  • SVBRDFs bi-directional reflectance distribution functions
  • Figure 1 shows a system 1 which can be used to capture polarised color images of an object 2, to use the captured images to compute further cues and to estimate object shape and SVBDRF using a trained deep network.
  • Figure 1 also shows a system 3 which is used to generate synthetic training data which is used to train the deep network.
  • an image-capturing system 4 for capturing linearly- polarized images 5 (or "captured inputs") of an object 2 includes a color digital camera 6 (herein simply referred to as a "camera”), a linear polarizing filter 7 mounted on the lens unit 8 of the camera 6, and a light source 9 in the form of an unpolarized flash 9.
  • a color digital camera 6 herein simply referred to as a "camera”
  • a linear polarizing filter 7 mounted on the lens unit 8 of the camera 6
  • a light source 9 in the form of an unpolarized flash 9.
  • the camera 6 may be stably mounted on a tripod 10 (not shown in Figure 1).
  • a color checker chart 11 for white balancing and radiometric calibration of the observed reflectance may be provided.
  • the color checker chart can be omitted particularly if the measurements are pre-calibrated.
  • Off-the-shelf equipment can be used for the imagecapturing system 4.
  • the camera 6 takes the form of a digital single-lens reflex (DSLR) camera, although other forms of digital cameras can be used.
  • DSLR digital single-lens reflex
  • a camera maybe used having an integrated polarization sensor such as a Sony (RTM) Polarsens (RTM), thereby obviating the need for an external linear polarizing filter. Images need not be captured under laboratory or studio conditions, provided flash illumination is the dominant illumination.
  • the object maybe all or part of a human subject, such as the face or body, an animal or a plant.
  • first, second and third linearly- polarized images 5 1 , 5 2 , 5 3 are captured, with the angle of polarization of the filter 7 set to o°, 90o and 45o respectively.
  • a fourth polarized image 5 4 can be captured with the angle of polarization set to 135o.
  • the image 5 4 can be constructed from the other images 5 1 , 5 2 , 5 3 . Other angles for polarization can be used.
  • the first, second and third linearly-polarized images 5 1 , 5 2 , 5 3 have angles of polarization set to o°, 6o° and 120° respectively. Not all the captured images 5 1 , 5 2 , 5 3 need be linearly- polarized.
  • the first and second images 5 1 , 5 2 may be linearly-polarized having angles of polarization set to o° and 45o and the third image 5 3 maybe unpolarized.
  • the same illumination condition is used to capture the images.
  • the object is illuminated in the same way from the same, single fixed point, i.e., the flash, which is in a fixed position.
  • multiple different illumination conditions are not used for the set of (three) images, for example, by positioning the flash in different positions or by using another flash in a different position for a different image acquisition when acquiring each respective image.
  • single, frontal flash illumination need not be used.
  • the same illumination conditions can be provided by multiple fixed points (such as a spherical or hemispherical array of light sources) or from the same fixed extended range of illumination (such as light panels) or other fixed illumination arrangements.
  • the same or substantially the same illumination light intensity is preferably used.
  • an image processing system 12 which may be implemented in software on a processor-based computer system (not shown), can be used to generate computed images 14 (or "further cues") from the captured images 5 (step S2).
  • the computed images 14 include a normalised color map 14 1 and a polarization shape map 14 2 (or " ⁇ -ambiguous shape map") in which the shape cue is computed from the horizontally polarized reflectance s 1 and 45o polarization reflectance s2 and which takes the form of normalised Stokes map 14 2 .
  • the computed images 13 can also include an unpolarized flash image (or "s 0 image").
  • a data processing system 15 which may be implemented in software, hardware, a field programmable gate array (FPGA), or using a graphics processing unit (GPU), in computer system (not shown), is used for estimate the 3D shape and SVBDRF 18, specifically a normal map 18 1 , a diffuse map I8 2 , a roughness map 18 3 , a specular map 18 4 and a depth map I8 5 (step S3).
  • the shape and SVBDRF 18 can be stored in storage 19 (step S4) and subsequently used by a rendering system 20 to display the object 2 on a display 21 (step S5).
  • the data processing system 15 implements a deep network 22 which is trained using training data 23 and which generates the appearance- and shape-related maps 18 from the captured inputs 5 and optionally the computed inputs 14.
  • Linear polarization cues in surface reflectance are used to provide strong initial cues to the deep network 22. While polarization imaging close to the Brewster angle allows extraction of many appearance cues directly, this can generally only be done reliably for planar surfaces and reference is made to Riviere et al. ibid.. Accordingly, deep learning is used to compensate for the limitations of the polarization signal over the surface of a 3D object 2.
  • the training data 23 can take the form of synthetic training data, measured training data (or "real training data"), or a mixture of synthetic and measured training data. Synthesizing training data can help to generate a large volume of training data more quickly than acquiring measured training data.
  • the deep network 22 includes a parallel arrangement of a U-Net image-to-image network 29 (hereinafter referred to simply as the "U-Net”) and a global features network 30 similar to that described in Deschaintre et al. ibid., which is incorporated herein by reference.
  • the U-Net 29 is trained to employ polarization images 5 of the object 2 as input along with explicit cues 14 provided by the polarization signal 5, and to output five maps 18 related to appearance and shape, namely diffuse and specular albedo 18 2 , 18 4 , specular roughness 18 3 , surface normal 18 1 and depth I8 5 .
  • two specific cues 14 1 , 14 2 i.e., channels of information
  • the first is a reflectance cue 14 1 in the form of normalized diffuse color computed by normalizing the reflectance minima obtained (through sinusoidal fitting) from the acquired polarized images.
  • the second is a shape cue 14 2 , in particular a p-ambiguous shape map, in the form of a normalized Stokes map.
  • the normalized Stokes map encodes the self-normalized s,, s 2 components of Stokes parameters of linear polarization and computes the normalized variation in the reflectance under different polarization filter orientations, providing a p ambiguous initialization for surface normals.
  • An angle of polarization map computed from si, S2 could be used instead of the normalized Stokes map as a shape cue.
  • a synthetic dataset 23 is created (by the generator 24) consisting of 20 complex 3D geometries of realistic objects mapped with procedurally and artistically generated SVBRDFs based on a dataset disclosed in V. Deschaintre et al:. "Guided fine-tuning for large-scale material transfer", Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering), volume 39, no. 4 (2020). Other combinations can be used. For example, other, different 3D geometries can be used, other different numbers of geometries and other, different materials can be used, and/or another different SVBRDF dataset can also be employed for creating the training dataset.
  • Specialised decoder branches 33 1 , 33 2 , 33 3 are employed in the network 22 to output high-quality shape and reflectance parameter maps, and a mix of Li and rendering loss is used to train the network 22. Rendering loss is further improved by developing a differentiable polarized Tenderer, providing better gradients on the diffuse and specular behaviours.
  • the image-capturing system 4 i.e., the camera 6, the polarizer 7, the lens 8 and the flash 9
  • the image processing system 6, the digital processing system 15 and the rendering system 20 may be integrated into one device.
  • the training dataset 23 is generated using 20 complex meshes of realistic objects and 2000 different materials (SVBRDFs).
  • the test dataset 23 uses 6 unique meshes and 30 materials.
  • Renderings are generated for four polarization filter angles, namely o°, 45o, 90°, and 135o, and the s 0 image, alongside the ground truth SVBRDF and depth maps.
  • the dataset is further augmented with a normalized Stokes map and normalized diffuse color that are computed from the different polarized renderings.
  • the dataset could be also augmented with a degree of polarization (DOP) map.
  • DOP degree of polarization
  • Figure 6A shows an ideal normalized Stokes map for a sphere under frontal flash illumination.
  • RGB color coding for Stokes vectors R (s 0 ) is set to 0.5
  • Figure 6B illustrates the signal captured in practice with a measured Stokes map of a rubber ball with embossed text under flash illumination.
  • Synthetic generation is augmented with Gaussian noise to mimic the perturbation in the acquisition process.
  • HDR data capture is simulated and 16-bit portable graphics format (PNG) images are used.
  • Figure 2D gives examples of the synthetic dataset 23.
  • s 0 represents the unfiltered reflectance
  • s 1 represents the horizontally polarized reflectance
  • s 2 represents the 45o polarization reflectance.
  • Directly-measured Stokes parameters depend on the bidirectional reflectance distribution function (BRDF) of the surface and the lighting conditions.
  • BRDF bidirectional reflectance distribution function
  • s ⁇ and s 2 are normalised with respect to each other to extract the directional information about the surface normal up to a p ambiguity. Normalized Stokes parameters are used as an additional cue for the network, helping to disambiguate the shape from the reflectance, improving shape and SVBRDF acquisition.
  • measured Stokes parameters consist of a mix of contributions from specular and diffuse polarization caused by their respective reflectance. These two types of polarization are captured by the Fresnel equations on surface reflectance and transmission for specular and diffuse polarization respectively.
  • specular polarization usually dominates under direct area illumination. This tends be the reason why previous approaches to polarization under controlled spherical illumination modelled only specular polarization.
  • A. Ghosh et ah "Circularly polarized spherical illumination reflectometry", ACM Trans. Graph. (Proc. SIGGRAPH Asia), vol. 29, pp. 162:1-162:12 (2010) and G. C. Guarnera etal.
  • the minimum intensity information does not necessarily fall exactly at the three polarization angles captured. Therefore, a sinusoidal fitting per pixel is performed by the image processing system 12 for each observation (I h , I v and I 45 ) to fit the
  • the minimum reflectance values are normalised to extract the normalized diffuse color which are provided to the network as a reflectance cue. This color information can, however, be lost in some over saturated pixels caused by extreme dynamic range of flash illumination, despite HDR imaging, and may require image in-painting to fill in the saturated pixels.
  • the above sinusoidal fitting to the measurements can also be used to compute the maximum reflectance value which in conjunction with the minimum reflectance value can be used to compute the degree of polarization (DOP) of reflectance as:
  • DOP can encode some shape information for a 3D object.
  • the DOP increases with increasing angle of incidence for diffuse polarization, as illustrated in, for example, Figure 12.
  • a planar object such as a brick wall
  • DOP due to specular polarization can also encode surface reflectance information related to specular roughness.
  • the deep network 22 is trained to output diffuse and specular albedos 18 2 ,18 4 , specular roughness 18 3 , normal map 18 1 and depth map 18 5 of the input object 2 ( Figure 1).
  • An encoder-decoder architecture 31, 32 is employed.
  • the decoder architecture 32 is split it into three branches 33 1 , 33 2 , 33 3 , each specialized in an aspect of shape or appearance.
  • the specular albedo and roughness maps 18 2 ,18 4 are grouped in one branch33 3 and the normal and depth maps 18 1 ,I8 5 are grouped in another 33 2 as they are closely related.
  • a third branch 33 2 handles the diffuse albedo I8 2 .
  • All three branches 33 1 , 33 2 , 33 3 of the decoder 32 receive the same inputs from the encoder 31, but the skip connections 34 are made more flexible.
  • two res-blocks 35, 36 and a convolution layer 37 are added to the skip connections, allowing the training process to adjust the information transferred to each decoder branch 33 1 , 33 2 , 33 3 from the encoder 31.
  • the res-block 35, 36 on the skip connections 34 allows the network 22 to forward the most relevant information to each separate decoder branch 33 1 , 33 2 , 33 3 helping to decorrelate the diffuse response from the other parameters. This can help to preserve high-frequency features in all of the reflectance and shape maps as each of these have a different scale and dynamic range, as well as decorrelating the details in the predicted maps from each other.
  • the network is trained on 512 x 512 images.
  • Polarization rendering loss The network 22 is trained using two losses, namely an Li loss to regularize the training, computing an absolute difference between the output maps and the targets, and a polarized rendering loss.
  • the rendering loss used by Deschaintre et al. only computes losses (i.e., errors) for standard renderings based on predicted versus ground truth reflectance and shape maps.
  • Polarized rendering loss computes losses (i.e., errors) for more sophisticated renderings that include specular and diffuse polarization simulations.
  • Rendering losses can be efficient in training reflectance acquisition methods. These are improved by simulating the polarization behaviour of surface reflectance in a differentiable fashion, allowing gradients of rendering effects from diffuse and specular polarization to be taken into account in the training process. Acquisition procedure
  • the acquisition process involves capturing an object 2 under flash illumination with three polarization filter orientations, namely 0°, 45°, and 90°.
  • a DSLR camera 6, a tripod 10 and a linear polarizing filter 7 are used and the polarizer 7 is manually rotated on the lens 8 to acquire the data 5.
  • polarization sensors e.g., Sony (RTM) Polarsens (RTM) can be used which allow rapid capture of this information in a single shot.
  • a small color checker 11 next to the captured object 2 is used for white balancing and HDR capture, using autoexposure bracketing on the camera, to better extract the polarization information and match the object appearance as closely as possible.
  • the acquisition process takes around a minute.
  • FIG. 2A A typical acquisition scene is illustrated in Figure 2A.
  • the method herein described is quantitatively compared to Li et al. ibid, and Boss et al. ibid, using Li distance.
  • the error on the normal maps, depth and directly on renderings are evaluated as these are not affected by the different BRDF models chosen by the different methods. This numerical evaluation is performed on 250 combinations of 6 randomly rotated meshes and 30 SVBRDF.
  • the rendering error is computed over 20 renderings for each result with varying light properties. Table 1 below shows that the method strongly benefits from the polarization cues, white balancing and HDR imaging with significantly lower error on depth, normal and renderings. Table 1
  • Figure 9 shows a comparison based on synthetic test data.
  • the method produces more plausible results and better captures the appearance of the input. While the re-renderings (far right column) and shape can be directly compared, the BRDF parameters maps are provided for qualitative evaluation as different BRDF models are used by the different methods.
  • the inputs are adapted to each method and the published codes for Li et al. ibid, and Boss et al. ibid, are used to generate results.
  • the method captures the global 3D shape of the object much better than single-image methods.
  • An important distinction over each of these is that the method does not correlate the SVBRDF variation in the input to normal variation in the output as the Stokes map disambiguate this information.
  • Figure 10 shows results on real objects.
  • the method better recovers the global shape of the object as well as its appearance showing that it generalizes well to real acquisition. This is particularly seen in the rendering under a new flash lighting direction where the results using the method demonstrate appropriate shading variation due to the estimated surface normal and reflectance maps.
  • Table 2 The contribution of the different technical components computed over the test set is evaluated. For each column, training was performed without the component, namely (a) improved skip connections, (b) polarized rendering loss and (c) polarization cues. The normal error is reported in degrees, while the rest are reported as an Li distance. For all parameters, a lower value is better. The use of both improved skip connections and polarized rendering loss improve results, but most importantly the polarization cues significantly improve the results on all recovered properties.
  • the first column of Table 2 evaluates the method with standard skip connections.
  • the res-block 35, 36 ( Figure 8) on the skip connections allows the network 22 ( Figure 8) to forward the most relevant information to each separate decoder branch 33 , 33 2 , 33 3 ( Figure 8) helping to decorrelate diffuse response from the other parameters.
  • Such a correlation effect is visible in Figure 9 in Li et al.'s result, for example.
  • the second column of Table 2 evaluates the method with a rendering loss similar to V. Deschaintre et al. ibid.
  • the differentiable polarized renderings that are implemented help the network to better separate the diffuse and specular signal -with small improvement in the roughness and specular, but mostly in de-lighting the diffuse albedo.
  • Polarization cues
  • the third column of Table 2 evaluates the method with a single HDR, white balanced flash input without any polarization information. All the recovered parameters significantly suffer from the absence of polarization cues. It is found that the single image method rendering error to be lower than compared methods, which can be attributed to the use of a white balanced, HDR input and training on complex meshes, helping to recover the global curvature.
  • flash illumination arrangement can be used to exploit a clean signal of diffuse polarization for shape and reflectance estimation
  • other illumination arrangements such as an inwardly-directed circular or spherical array of light panels, for example, as described in US 2021/05015 Ai, can be used to exploit a clean signal of specular polarization for shape and reflectance estimation.
  • the deep network needs to be trained accordingly on specular polarization cues.
  • Stokes parameters are a set of values describe the polarization state of light in terms of its total intensity .
  • DOP degree of polarization,
  • shape parameters of the polarization ellipse consist of four vectors: where s 0 is the total intensity of the light, s 1 and s 2 are the intensity of 0° and +45o polarization respectively, and s 3 is the intensity of right circular polarization.
  • s 0 is the total intensity of the light
  • s 1 and s 2 are the intensity of 0° and +45o polarization respectively
  • s 3 is the intensity of right circular polarization.
  • Mueller calculus where is and s ref are Stokes vectors of the incident light and reflected light respectively, is the Mueller matrix of rotation which rotates the incident Stokes vector in the global frame (same as the camera frame in our case) into the canonical frame of reference (plane of incidence), is the concatenation of the Mueller reflection matrix and a linear retarder of phase ⁇ . The term rotates the result back to the camera frame, hence the angle.
  • the concatenation of the Mueller matrices of a linear di-attenuator calculates the Stokes vectors of light upon reflection off the surface, in the local plane of incidence frame.
  • the initial Stokes vectors are defined in the global frame and therefore the Mueller rotation Matrix is required to align these two frames: where ⁇ is the angle between the y direction of the right-hand global frame and the normal ⁇ of the surface.
  • specular reflectance coefficients are replaced by transmission coefficients: and the refractive index of the material that light is incident on becomes 1 /n 2 as the light gets scattered and comes out from the material.
  • equation A5 gives the correct diffuse polarization orientation in renderings compared to real measurements, the DOP however does not match actual observations.
  • the observed diffuse DOP can go up to approximately 10% at an incidence angle of roughly 85° for common dielectric materials.
  • Atkinson & Hancock ibid. report the diffuse DOP as reaching roughly 25% for materials with an index of refraction (IOR) 1.4 at an 85° admittance angle.
  • diffuse DOP is slightly reduced explaining the 10% observed.
  • Figures 12A and 12B show measured diffuse DOP (y-axis) for different angles of incidence (x-axis: in radians) on two spherical balls.
  • Figure 12C shows simulated diffuse DOP using a polynomial fit to measured data.
  • the diffuse polarization is rendered based on equation A5, with the following approximations: -
  • the diffuse polarization calculated for ⁇ from range is stretched and mapped to range o - ⁇ . This is due to a mirror reflection assumption in equation A5 which yields a total internal reflection within the range
  • the architecture is based on U-Net 29 with a joint encoder 31 with 9 convolutions with stride 2 and kernel size 4. Between each layer a Leaky Relu
  • the decoder 32 is split into three branches 33 1 , 33 2 , 33 3 specialized in different aspect of appearance.
  • the branches 33 1 , 33 2 , 33 3 respectively output (1) depth and normal I8 5 , 18 1 , (2) diffuse albedo I8 2 and (3) roughness and specular albedo 18 3 , 18 4 .
  • Each deconvolution is composed of a 2x upsampling and two 3x3 convolutions with stride 1.
  • the encoder 31 is connected to the decoder branches through skip connections 34 to propagate high frequency details.
  • Two residual blocks 35, 36 and a 3x3 convolution are added to each skip connection 34 allowing the network 22 to learn which information is most relevant to each decoder branch 33 1 , 33 2 , 33 3 . More than two residual blocks can be used.
  • Each residual block 35, 36 is composed of two 3x3 convolutional layers with stride 1 and Relu activation functions.
  • the network 22 was trained for 5 days (1,000,000 steps) on a GPU, in particular, a single Nvidia RTX 2080 TI. A batch size of 2 and a learning rate of 0:00002 were used. The network is fully convolutional and trained on 512 x 512 images.
  • the loss function uses a distance between the parameter maps for regularization with a weight of 0.25 and a polarized rendering loss, computing four polarization angles for three different lighting conditions with a weight of 1.0.
  • the distance is measured between parameters with a Li distance except for the normal map for which a cosine distance is used.
  • Images can be acquired using frontal flash in which case diffuse polarization dominates and the Stokes map is based on diffuse polarization. Diffuse polarization is independent of the polarization state of incoming illumination. Thus, flash light can be unpolarized, linearly polarized or even circular polarized.
  • FIG. 13A to 13D, 14 and 15 polarization of flash does not change the measured Stokes map and the deep network could be trained with a training data simulating the unpolarized or polarized state of flash illumination and/or with real measurements under such illumination for real data.
  • Figures 13A to 13D illustrates simulation of a linearly-polarized frontal flash on a sphere with tiled green stone material and Figure 15 shows the resulting Stokes map which is dominated by diffuse polarization which is independent of the polarization state of the flash.
  • Images can be acquired using uniform surrounding illumination (for example, spherical or hemispherical) in which case specular polarization dominates.
  • specular polarization dominates.
  • a very similar Stokes maps can be obtained using unpolarized or circularly polarized illumination.
  • the Stokes map due to specular polarization is a rotated version of the Stokes map due to diffuse polarization.
  • the deep network could be trained with a training data simulating the unpolarized or circularly polarized state of uniform surrounding spherical/ hemispherical illumination and/or with similar real measured data.
  • flash illumination and surrounding illumination The main difference between flash illumination and surrounding illumination is that, with uniform surrounding illumination, if the incident illumination is linearly polarized in a specific orientation, then the resulting Stokes map may not be a good cue for surface shape (unless the object is planar) and so may be sub-optimal for shape cue.
  • linearly-polarized illumination can provide very good reflectance cue for diffuse and specular albedo.
  • a special case is planar objects where uniform illumination on a planar object can be achieved using just an area-light source or light panel or display panel that is sufficiently large, or locally uniform environmental illumination.
  • an area- light/panel illumination or environmental illumination can be incident from the front (near normal incidence) or obliquely at near Brewster angle of incidence. Specular polarization dominates in the resulting surface reflectance of a planar object, and similar to the surrounding illumination case, the deep network could be trained with a training data simulating unpolarized or circularly polarized state of uniform illumination from an area- light/light-panel or environment and/or with similar real measured data. Furthermore, for the case of planar objects, the deep network could also be trained with a training data simulating linearly polarized state of uniform illumination from an area- light/light-panel or environment and/or with similar real measured data.
  • Image capture for shape and spatially varying reflectance estimation here described can be used to render images used in computer graphics applications such as visualization, visual effects, augmented reality, virtual reality, computer games and e- commerce.
  • the object maybe a plant, animal or human (e.g ., the whole body) or a part of a plant, animal or human (such as a face or hand).
  • the object maybe an inanimate object or part of an inanimate object.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

A method of image processing is described. The method comprises receiving a set of at least three images of an object including at least two linearly-polarized images and at least one color image, wherein the three images have the same view of the object and are acquired under the same illumination condition in which either diffuse polarization or specular polarization dominates in surface reflectance, and wherein a set of Stokes parameters s0, s1 and s2 is determinable from the at least three images. The method further comprises generating three-dimensional shape and spatially-varying reflectance of the object from the set of at least three images using a deep neural network trained with a plurality of sets of training images, each of the plurality of sets of training images including at least three training images including at least two linearly-polarized training images and at least one color image from which a respective set of Stokes parameters s0, s1 and s2 is determinable and storing said three-dimensional shape and spatially-varying reflectance generated by the deep neural network.

Description

Image processing Field
The present invention relates to image processing, in particular to estimating three- dimensional shape and spatially- varying reflectance of an object from a set of images of the object.
Background
Accurately acquiring the shape and appearance of real-world objects and materials has been an active area of research in vision and graphics with a wide range of applications including, for example, analysis/recognition, and digitization for visual effects, games, virtual reality, cultural heritage, advertising and design. Advances in digital imaging over the last two decades has resulted in image-based acquisition techniques becoming an integral component of appearance modelling and three-dimensional (3D) reconstruction.
J. Riviere et al.: "Polarization imaging reflectometry in the wild", ACM Transactions on Graphics, volume 36, no. 6, Article 206 (2017) describes on-site acquisition of surface reflectance for planar, spatially varying, isotropic samples in uncontrolled outdoor environment. It employs linear-polarization imaging from two, near-orthogonal views, close to the Brewster angle of incidence, to maximize polarization cues for surface reflectance estimation.
Z. Li etal. : "Learning to reconstruct shape and spatially- varying reflectance from a single image ", ACM Transactions on Graphics, volume 37, no. 6, Article 269 (2018) (herein referred to as "Li et al. ") describes recovering spatially- varying bidirectional reflectance distribution function (SVBRDFs) and complex geometry from a single RGB image captured under a combination of unknown environment illumination and flash lighting by training a deep neural network to regress shape and reflectance from the image.
V. Deschaintre etal. : "Single-Image SVBRDF Capture with a Rendering -Aware Deep Network", ACM Transactions on Graphics, volume 37, no. 4, Article 128 (2018) (herein referred to as "Deschaintre etal.") describes using a neural network to reconstruct complex SVBRDFs of planar samples given a single input photograph under flash illumination, based on training using only synthetic data. A. Kadambi etal. : "Polarized 3D: High-quality depth sensing with polarization cues", Proceedings of the IEEE International Conference on Computer Vision, pages 3370- 3378 (2015) (herein referred to as "Kadambi et al.") describes using polarization enhance depth maps obtained using a Microsoft (RTM) Kinect depth sensor. Y. Ba et al. : "Deep shape from polarization", European Conference on Computer Vision (ECCV), 2020 (herein referred to as "Ba etal.") describes a deep learning-based approach to inferring the shape of a surface under uncontrolled environment illumination using polarization imaging. Both Kadambi et al. and Ba et al. only estimate shape.
M. Boss et al.: "Two-shot spatially- varying brdf and shape estimation", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020 (herein referred to as "Boss et al.") describes a cascaded network and guided prediction networks for SVBRDF and shape estimation from two-shot images, under flash and ambient environmental illumination respectively.
Summary
According to a first aspect of the present invention there is provided a method comprising receiving a set of at least three images of an object including at least two linearly-polarized images (for example, at least two linearly-polarized color images) and at least one color image (which may or may not be linearly-polarized), wherein the three images have the same view of the object and are acquired under the same illumination condition (in other words, for each of the at least three images, the object is illuminated in the same way, e.g., from the same, single fixed point, from the same, multiple fixed points, or from the same fixed range or extent of illumination) in which either diffuse polarization or specular polarization dominates in surface reflectance, and wherein a set of Stokes parameters s0, s1 and s2 is determinable from the at least three images. The method further comprises generating three-dimensional shape and spatially- varying reflectance of the object from the set of at least three images using a deep neural network trained with a plurality of sets of training images, each of the plurality of sets of training images including at least three training images including at least two linearly-polarized training images and at least one color image from which a respective set of Stokes parameters s0, s1 and s2 is determinable and storing said three- dimensional shape and spatially-varying reflectance generated by the deep neural network.
The three-dimensional shape and spatially- varying reflectance can be used to render a high-quality image of the object under new lighting conditions.
The images are preferably acquired under controlled illumination, for example, indoors or, if outdoors, under cloudy conditions or other suitably shaded conditions providing uniform illumination, whereby polarized illumination is minimised or minimal such that it is not dominant.
The illumination may be flash illumination such that diffuse polarization dominates and, thus, the Stokes map may be based on diffuse polarization. The illumination may be unpolarized. The flash illumination may, however, be linearly polarized or circularly polarized. The flash illumination may include a mixture of polarized light (linearly- and/ or circularly-polarized light) and/ or unpolarized light. The illumination may be uniform and surround the object {e.g., spherical or hemispherical illumination) such that specular polarization dominates. The uniform illumination maybe unpolarized or circularly-polarized for non-planar 3D objects. The uniform illumination may include a mixture of circularly-polarized light and unpolarized light for non-planar 3D objects. For a planar object, the uniform illumination may be from an extended or a sufficiently large area light source or light panel or display panel, or even locally uniform environmental illumination. For planar objects, the uniform illumination from an extended area-light may be unpolarized, linearly-polarized or circularly-polarized.
The set of at least three images may comprise at least three color images. The at least two linearly-polarized images and at least one color image may comprise at least two linearly-polarized colour images.
A set of Stokes parameters s0, s1 and s2 is determinable from the at least three images, for example, if the at least two linearly-polarized images include first and second linearly-polarized images in which the angle of polarization between the first and second images are separated by 45º. A set of Stokes parameters s0, s1 and s2 is determinable from the at least three images, for example, if the at least two linearly- polarized images include first, second and third linearly-polarized images in which the angle of polarization are o°, 45º and 90° respectively. The set of linear Stokes parameters may be determined by a different combination of angles of polarization, such as, for example, o°, 6o° and 120° respectively.
At least the unpolarised Stokes parameter s0 has color. The horizontally polarized reflectance Stokes parameter s1 and/or the polarization reflectance Stokes parameter s1 may have color.
The object maybe a three-dimensional object, that is, an object which is not substantially flat or planar, and/ or includes one or more convex surface(s). The three- dimensional object may include whole or part of a human subject (e.g., face or full- body), an animal or a plant. The object, however, may be a planar object.
The method may further comprise receiving a polarization shape map generated from the Stokes parameters s1 and s2 for the object and/or a colour map and/or a degree of polarization (DOP) map, or generating a polarization shape map from the Stokes parameters s1 and s2 for the object and/or a colour map and/or a DOP map using the set of at least three images. The three-dimensional shape and spatially-varying reflectance may be generated from the set of at least three images and the polarization shape map and/or the colour map and/or the DOP map.
The color map maybe a diffuse color map. The polarization shape map maybe a normalised Stokes map or an angle of polarization map. The degree of polarization (DOP) map maybe a diffsue DOP map or a specular DOP map.
The plurality of sets of training images may comprise a plurality of sets of synthesized training images. For example, the plurality of sets of synthesized training images may be generated using a plurality of meshes of objects and a plurality of different spatially- varying bidirectional reflectance distribution function (SVBRDs) corresponding to different materials. Generation of a training image may include selecting a mesh and a material and randomly rotating the mesh and material. The plurality of sets of synthesized training images may comprise a plurality of images modelling either diffuse or specular polarization in surface reflectance.
Additionally or alternatively, the plurality of sets of training images may comprise a plurality of sets of measured training images. The plurality of sets of measured training images may comprise a plurality of images exhibiting either diffuse or specular polarization in surface reflectance.
The at least three images of the object may comprise three or four linearly-polarized images, for example, three or four linearly-polarized color images. The three-dimensional shape may comprise a surface normal map and a depth map.
The spatially-varying reflectance may comprise a diffuse albedo map, and a specular albedo map, and/or a specular roughness map.
The deep neural network may comprise a convolutional neural network having an encoder and a decoder and skip connections between the encoder and decoder. The decoder may be a branched decoder comprising at least two branches. The skip connections may include at least one residual block or a series of at least two residual blocks. The deep neural network trained by considering rendering losses for each linearly-polarized image. The deep network may include a parallel arrangement of a U- Net image-to-image network and a global features network. The set of at least three images may be acquired using frontal flash illumination (which maybe unpolarized, or linearly or circularly polarized) incident on the object so as to cause diffuse polarization to dominate in the surface reflectance. The frontal illumination can be from a flash or a projector. Alternatively, the set of at least three images may be acquired using uniform illumination (which may be unpolarized or circularly polarized) disposed around and directed at the object so as to cause specular polarization to dominate in the surface reflectance. The uniform illumination may comprise a plurality of light sources arranged in a hemisphere or sphere around the object, or surrounding the object, to provide uniform illumination on the object.
If the object is a planar object, uniform illumination can be achieved using an extended or a sufficiently large area-light source or light panel or display panel, or locally- uniform environmental illumination incident on the object at near normal incidence or obliquely incident at near Brewster angle of incidence. For a planar object, the uniform illumination may be unpolarized, linearly polarized or circularly polarized.
According to a second aspect of the present invention there is provided a method comprising receiving a set of linearly-polarized color images of an object, each linearly- polarized image having a different angle of polarization, the linearly-polarized color images having the same view of the object and acquired using unpolarized, frontal, flash illumination of the object. The method may optionally include receiving a reflectance map and a shape map for the object generated from the set of linearly- polarized images. The method comprises generating three-dimensional shape and spatially- varying reflectance of the object from the set of linearly-polarized images, and optionally the reflectance map and the shape map, using a deep neural network trained with a synthetic or measured dataset, wherein the synthetic or measured dataset includes a plurality of sets of data, each set of data including a set of linearly-polarized images having different polarizations, and optionally a reflectance map and a shape map generated from the linearly-polarized images, and ground truth three-dimensional shape and spatially-varying reflectance and storing said three-dimensional shape and spatially-varying reflectance generated by the deep neural network
According to a third aspect of the present invention is provided a computer program comprising instructions for performing the method of the first or second aspect. According to a fourth aspect of the present invention is provided a computer program product comprising a computer readable medium (which maybe non-transitory) storing the computer program of the third aspect. According to a fifth aspect of the present invention there is provided a device comprising at least one processor and storage. The at least one processor is configured, in response to receiving a set of at least three images of an object including at least two linearly-polarized images and at least one color image, wherein the three images have the same view of the object and are acquired under the same illumination condition in which either diffuse polarization or specular polarization dominates, wherein a set of Stokes parameters s0, s1 and s2 is determinable from the at least three images, to generate three-dimensional shape and spatially- varying reflectance of the object from the set of at least three images using a deep neural network trained with a plurality of sets of training images, each of the plurality of sets of training images including at least three training images including at least two linearly-polarized training images and at least one color image from which a respective set of Stokes parameters s0, s1 and s2 is determinable and to store said three-dimensional shape and spatially-varying reflectance generated by the deep neural network in the storage. The at least one processor may receive a polarization shape map generated from the Stokes parameters s1 and s2 for the object and/or a colour map and/or a degree of polarization (DOP) map. The at least one processor may further be configured to generate a polarization shape map from the Stokes parameters s1 and s2 for the object and/or a colour map and/or a DOP map using the set of at least three color images. The at least one processor may be configured to generate three-dimensional shape and spatially-varying reflectance from the set of at least three color images and the polarization shape map and/or the colour map and/or the DOP map.
The device may further comprise a color digital camera and a linear polarizing filter for acquiring the at least three color images.
The device may further comprise or be provided with a flash or a projector for providing directional illumination on the object, preferably from a frontal direction.
The device may further comprise or be provided with a one or more light sources (for example, light emitting diodes, light panels or display panels) and, optionally, one or more reflecting surfaces arranged around the object to provide uniform illumination on the object. Light from the one or more light sources maybe bounced from the one or more reflecting surfaces(s).
The one or more light sources may comprise a plurality of light sources arranged in a hemisphere or sphere around the object. The one or more reflecting surfaces may comprise plurality of reflecting surfaces arranged in a hemisphere or sphere around the object. The reflecting surface(s) maybe concave. The reflecting surface(s) may provide diffuse reflection. The at least one processor may include one or more central processing units (CPUs). The at least one processor may include one or more graphical processing units (GPUs).
According to a sixth aspect of the present invention there is provided a method of training a deep neural network. The method comprises providing a plurality of sets of training images and corresponding ground truth three-dimensional shape and spatially- varying reflectance of objects to a deep neural network, each set of training images including at least three training images including at least two linearly-polarized training images (for example, at least two linearly-polarized color images) and at least one color image (which may or may not be linearly-polarized) from which a respective set of Stokes parameters s0, s1 and s2 is determinable; and storing the trained deep neural network.
The method may further comprise providing a polarization shape map generated from the Stokes parameters s1 and s2 and/or a colour map and/or a DOP map.
The set of training images may comprise a plurality of sets of synthesized training images and/or measured training images.
According to a seventh aspect of the present invention is provided a computer program comprising instructions for performing the method of the sixth aspect.
According to an eighth aspect of the present invention is provided a computer program product comprising a computer readable medium (which maybe non-transitory) storing the computer program of the seventh aspect. According to a ninth aspect of the present invention there is provided apparatus for comprising at least one processor and storage for training a deep neural network.
Brief Description of the Drawings
Certain embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:
Figure 1 is a block diagram of a system for estimating object shape and spatially- varying 5 bidirectional reflectance distribution function (SVBDRF) from polarization cues, the system including a trained deep network;
Figure 2A illustrates a system for capturing polarized images;
Figure 2B illustrates captured polarized images (or "captured inputs");
Figure 2C illustrates computed explicit cues (or "captured inputs") w Figure 2D illustrates synthetic training data used to train a deep network shown;
Figure 3 illustrates shape and SVBRDF estimated for an object in the form of normal, diffuse, specular, roughness and depth maps;
Figure 4 is a process flow diagram of a method of estimating object shape and SVBDRF; Figure 5 is a process flow diagram of a method of training a deep network;
15 Figure 6A is an ideal normalized Stokes map for a sphere under frontal flash illumination;
Figure 6B is a practical signal captured with a measured Stokes map of a rubber ball with embossed text under flash illumination;
Figures 7A and &B illustrate specular reflection on planar surfaces, namely a brick wall 0 and a color chart, due to unpolarized sky acquired at oblique angle of incidence using a linear polarizer in front of a camera at horizontal o° orientation ("Max") and at vertical 90° orientation ("Min") respectively;
Figure 7C is degree of polarization computed from Max and Min and which appears to contain cues about the surface specular roughness;
25 Figure 8 illustrates a deep network architecture which has a general U-Net and in which decoders are divided into three different branches, each handling a related set of output map(s), namely normal and depth, diffuse albedo, roughness and specular albedo, and in which res-blocks are introduced on the skip connections between the encoder and the different branches of the decoder allowing the network to adapt the 0 information forwarded to the different branches of the decoder;
Figure 9 illustrates comparisons of results on synthetic data produced by the method herein described and those produced by methods described in Li et al. and Boss et al.; Figure 10 illustrates comparisons of results on real objects produced by the method herein described and those produced by the method described in Li et al.
35 Figure 11A illustrates a mixed Stokes map of a ball under complex lighting;
Figure 11B illustrate an example of an exploitable pure Stokes map of a lemon Figures 12A and 12B illustrates plots of angle of incidence (x-axis) versus measured diffuse degree of polarization (DOP) (y-axis) for two spherical balls;
Figure 12C illustrates a plot of angle of incidence (x-axis) versus simulated DOP using a polynomial fit to measured data; Figures 13A, 13B, 13C and 13D illustrate simulated linearly-polarized images at o°, 45º, 90°, 135º respectively of a sphere having a surface comprised of tiled, green stone material acquired using frontal flash illumination;
Figure 14 is a simulated normalized color map of the sphere shown in Figures 13A to 13D; and Figure 15 is a simulated Stokes map obtained from the linearly-polarized images shown in Figures 13A to 13D illustrating dominance of diffuse polarization which is independent of the polarization state of flash illumination.
Detailed Description of Certain Embodiments Introduction
Practical acquisition of shape and spatially varying reflectance of three-dimensional (3D) objects is herein described which can recover the appearance of an object, for example, under different lighting conditions. The method employs acquiring polarization images with frontal flash illumination and exploits polarization cues in conjunction with deep learning. A high-dynamic range (HDR) synthetic dataset is created by simulating polarization behaviour on different geometries and spatially varying bi-directional reflectance distribution functions (SVBRDFs) which is used to train a deep network using supervised learning. This can then be used to estimate the 3D shape as surface normal and depth maps, and spatially varying reflectance properties, in the form of diffuse and specular albedo maps and specular roughness map. This enables high-quality renderings of acquired objects under new lighting conditions.
Overview Figure 1 shows a system 1 which can be used to capture polarised color images of an object 2, to use the captured images to compute further cues and to estimate object shape and SVBDRF using a trained deep network. Figure 1 also shows a system 3 which is used to generate synthetic training data which is used to train the deep network. Referring to Figures l and 2A, an image-capturing system 4 for capturing linearly- polarized images 5 (or "captured inputs") of an object 2 includes a color digital camera 6 (herein simply referred to as a "camera"), a linear polarizing filter 7 mounted on the lens unit 8 of the camera 6, and a light source 9 in the form of an unpolarized flash 9. The camera 6 may be stably mounted on a tripod 10 (not shown in Figure 1). A color checker chart 11 for white balancing and radiometric calibration of the observed reflectance may be provided. The color checker chart can be omitted particularly if the measurements are pre-calibrated. Off-the-shelf equipment can be used for the imagecapturing system 4. In particular, the camera 6 takes the form of a digital single-lens reflex (DSLR) camera, although other forms of digital cameras can be used. In particular, a camera maybe used having an integrated polarization sensor such as a Sony (RTM) Polarsens (RTM), thereby obviating the need for an external linear polarizing filter. Images need not be captured under laboratory or studio conditions, provided flash illumination is the dominant illumination. The object maybe all or part of a human subject, such as the face or body, an animal or a plant.
Referring to Figures 1, 2B and 4, three images 51, 52, 53 are captured which allow Stokes parameters s0, s1, s2 to be found (step Si). In this case, first, second and third linearly- polarized images 51, 52, 53 are captured, with the angle of polarization of the filter 7 set to o°, 90º and 45º respectively. A fourth polarized image 54 can be captured with the angle of polarization set to 135º. Alternatively, the image 54 can be constructed from the other images 51, 52, 53. Other angles for polarization can be used. For example, the first, second and third linearly-polarized images 51, 52, 53 have angles of polarization set to o°, 6o° and 120° respectively. Not all the captured images 51, 52, 53 need be linearly- polarized. For example, the first and second images 51, 52 may be linearly-polarized having angles of polarization set to o° and 45º and the third image 53 maybe unpolarized.
The same illumination condition is used to capture the images. In other words, for each image, the object is illuminated in the same way from the same, single fixed point, i.e., the flash, which is in a fixed position. Expressed differently, multiple different illumination conditions are not used for the set of (three) images, for example, by positioning the flash in different positions or by using another flash in a different position for a different image acquisition when acquiring each respective image. As will be explained in more detail hereinafter, single, frontal flash illumination, however, need not be used. Instead, the same illumination conditions can be provided by multiple fixed points (such as a spherical or hemispherical array of light sources) or from the same fixed extended range of illumination (such as light panels) or other fixed illumination arrangements. The same or substantially the same illumination light intensity is preferably used.
Referring to Figures l, 2.C and 4, an image processing system 12, which may be implemented in software on a processor-based computer system (not shown), can be used to generate computed images 14 (or "further cues") from the captured images 5 (step S2). The computed images 14 include a normalised color map 141 and a polarization shape map 142 (or "π-ambiguous shape map") in which the shape cue is computed from the horizontally polarized reflectance s1 and 45º polarization reflectance s2 and which takes the form of normalised Stokes map 142. The computed images 13 can also include an unpolarized flash image (or "s0 image"). Referring to Figures 1, 2D, 3 and 4, a data processing system 15, which may be implemented in software, hardware, a field programmable gate array (FPGA), or using a graphics processing unit (GPU), in computer system (not shown), is used for estimate the 3D shape and SVBDRF 18, specifically a normal map 181, a diffuse map I82, a roughness map 183, a specular map 184 and a depth map I85 (step S3). The shape and SVBDRF 18 can be stored in storage 19 (step S4) and subsequently used by a rendering system 20 to display the object 2 on a display 21 (step S5).
The data processing system 15 implements a deep network 22 which is trained using training data 23 and which generates the appearance- and shape-related maps 18 from the captured inputs 5 and optionally the computed inputs 14. Linear polarization cues in surface reflectance are used to provide strong initial cues to the deep network 22. While polarization imaging close to the Brewster angle allows extraction of many appearance cues directly, this can generally only be done reliably for planar surfaces and reference is made to Riviere et al. ibid.. Accordingly, deep learning is used to compensate for the limitations of the polarization signal over the surface of a 3D object 2.
The training data 23 can take the form of synthetic training data, measured training data (or "real training data"), or a mixture of synthetic and measured training data. Synthesizing training data can help to generate a large volume of training data more quickly than acquiring measured training data. Referring in particular to Figure 1, the deep network 22 includes a parallel arrangement of a U-Net image-to-image network 29 (hereinafter referred to simply as the "U-Net") and a global features network 30 similar to that described in Deschaintre et al. ibid., which is incorporated herein by reference.
The U-Net 29 is trained to employ polarization images 5 of the object 2 as input along with explicit cues 14 provided by the polarization signal 5, and to output five maps 18 related to appearance and shape, namely diffuse and specular albedo 182, 184, specular roughness 183, surface normal 181 and depth I85. From the acquired polarization information, two specific cues 141, 142 (i.e., channels of information) are computed to provide as additional input to the deep network 22. The first is a reflectance cue 141 in the form of normalized diffuse color computed by normalizing the reflectance minima obtained (through sinusoidal fitting) from the acquired polarized images. The second is a shape cue 142, in particular a p-ambiguous shape map, in the form of a normalized Stokes map. The normalized Stokes map encodes the self-normalized s,, s2 components of Stokes parameters of linear polarization and computes the normalized variation in the reflectance under different polarization filter orientations, providing a p ambiguous initialization for surface normals. An angle of polarization map computed from si, S2 could be used instead of the normalized Stokes map as a shape cue.
To train the deep network 22, a synthetic dataset 23 is created (by the generator 24) consisting of 20 complex 3D geometries of realistic objects mapped with procedurally and artistically generated SVBRDFs based on a dataset disclosed in V. Deschaintre et al:. "Guided fine-tuning for large-scale material transfer", Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering), volume 39, no. 4 (2020). Other combinations can be used. For example, other, different 3D geometries can be used, other different numbers of geometries and other, different materials can be used, and/or another different SVBRDF dataset can also be employed for creating the training dataset. Specialised decoder branches 331, 332, 333 (Figure 8) are employed in the network 22 to output high-quality shape and reflectance parameter maps, and a mix of Li and rendering loss is used to train the network 22. Rendering loss is further improved by developing a differentiable polarized Tenderer, providing better gradients on the diffuse and specular behaviours. The image-capturing system 4 (i.e., the camera 6, the polarizer 7, the lens 8 and the flash 9), the image processing system 6, the digital processing system 15 and the rendering system 20 may be integrated into one device. Method
Data generation
Referring to Figures 1, 2D and 5, leveraging polarization cues with a deep network 22, requires a large dataset of objects to be captured with different polarizer orientations 27, 28 along with ground truth SVBRDF 29. Measuring such a large dataset would require advanced, expensive equipment and considerable time, although this approach can be used. Instead, synthetic data rendering is used to create a dataset 23 of over 100,000 sets of images (step Ti).
The training dataset 23 is generated using 20 complex meshes of realistic objects and 2000 different materials (SVBRDFs). The test dataset 23 uses 6 unique meshes and 30 materials. For each set of polarization images in the training set 27, a mesh and material are selected and randomly rotated to augment diversity of the training data.
Renderings are generated for four polarization filter angles, namely o°, 45º, 90°, and 135º, and the s0 image, alongside the ground truth SVBRDF and depth maps. The dataset is further augmented with a normalized Stokes map and normalized diffuse color that are computed from the different polarized renderings. Optionally, the dataset could be also augmented with a degree of polarization (DOP) map. Referring to Figures 6A and 6B, a perfect Stokes map does not occur in real acquisition.
Figure 6A shows an ideal normalized Stokes map for a sphere under frontal flash illumination. RGB color coding for Stokes vectors, R (s0) is set to 0.5, G (s1) and B (s2) are normalised and mapped to o - 1 range for visualisation. Figure 6B illustrates the signal captured in practice with a measured Stokes map of a rubber ball with embossed text under flash illumination.
Synthetic generation is augmented with Gaussian noise to mimic the perturbation in the acquisition process. To better benefit from polarization cues, HDR data capture is simulated and 16-bit portable graphics format (PNG) images are used. Figure 2D gives examples of the synthetic dataset 23.
Polarization information Stokes parameters The polarization state of a reflected light gives useful cues about the surface normal. The transformation of the Stokes parameters upon reflection largely depends on the normal of the surface. Measuring the reflected Stokes parameters under unpolarized light ( e.g ., flash illumination) can be achieved using three observations with linear polarizing filter set to o°, 45º and 90°. These three images, named IH, I45 and Iv, can be used to calculate the Stokes parameters of linear polarization per pixel with the following equations:
So = Ih + Iv
S1 = Ih - Iv (1)
S2 = 2 * I45 — So
Here, s0 represents the unfiltered reflectance, s1 represents the horizontally polarized reflectance, and s2 represents the 45º polarization reflectance. Directly-measured Stokes parameters depend on the bidirectional reflectance distribution function (BRDF) of the surface and the lighting conditions. s± and s2 are normalised with respect to each other to extract the directional information about the surface normal up to a p ambiguity. Normalized Stokes parameters are used as an additional cue for the network, helping to disambiguate the shape from the reflectance, improving shape and SVBRDF acquisition.
In the general case, measured Stokes parameters consist of a mix of contributions from specular and diffuse polarization caused by their respective reflectance. These two types of polarization are captured by the Fresnel equations on surface reflectance and transmission for specular and diffuse polarization respectively. The magnitude of specular polarization usually dominates under direct area illumination. This tends be the reason why previous approaches to polarization under controlled spherical illumination modelled only specular polarization. Reference is made to A. Ghosh et ah: "Circularly polarized spherical illumination reflectometry", ACM Trans. Graph. (Proc. SIGGRAPH Asia), vol. 29, pp. 162:1-162:12 (2010) and G. C. Guarnera etal. : "Estimating surface normals from spherical stokes reflectance fields", ECCV Workshop on Color and Photometry in Computer Vision, pages 340-349 (2012). On the other hand, due to the use of frontal flash illumination, the direct specular reflection is limited to a very small frontal patch, and most of the object surface instead exhibits diffuse polarization. Therefore, the normalized Stokes map is modelled as the result of
5 diffuse polarization in the synthetic training data 23. Under more complex environmental illumination, an arbitrary mixture of specular and diffuse polarization can be observed, which is not currently modelled synthetically.
Diffuse color
10 The polarization measurements are also employed to compute an estimate of normalized diffuse color. Rotating a linear polariser 7 (Figure 1) in front of the camera lens 8 (Figure 1) changes the observed intensity, as the specular reflection reaches its minimum when the polariser axis is parallel to the plane of incidence. As the flash light is white and the residual specular signal is weak, it is possible to extract an estimate of
15 the normalized diffuse color.
In practice, the minimum intensity information does not necessarily fall exactly at the three polarization angles captured. Therefore, a sinusoidal fitting per pixel is performed by the image processing system 12 for each observation (Ih, Iv and I45) to fit the
20 minimum value. The minimum reflectance values are normalised to extract the normalized diffuse color which are provided to the network as a reflectance cue. This color information can, however, be lost in some over saturated pixels caused by extreme dynamic range of flash illumination, despite HDR imaging, and may require image in-painting to fill in the saturated pixels.
25
Degree of Polarization (DOP)
The above sinusoidal fitting to the measurements can also be used to compute the maximum reflectance value which in conjunction with the minimum reflectance value can be used to compute the degree of polarization (DOP) of reflectance as:
30
DOP = (maximum - minimum)/ (maximum + minimum) (2)
DOP can encode some shape information for a 3D object.
35 The DOP increases with increasing angle of incidence for diffuse polarization, as illustrated in, for example, Figure 12. Referring to Figures 7A, 7B and 7C, for a planar object (such as a brick wall) illuminated by unpolarized light from the sky acquired at an oblique angle of incidence using a linear polarizer from a camera orientated at o° and 90°, DOP due to specular polarization can also encode surface reflectance information related to specular roughness.
Network architecture
Referring to Figure 8, to estimate the shape and spatially varying reflectance of an object using the acquisition method, the deep network 22 is trained to output diffuse and specular albedos 182,184, specular roughness 183, normal map 181 and depth map 185 of the input object 2 (Figure 1). An encoder-decoder architecture 31, 32 is employed. The decoder architecture 32 is split it into three branches 331, 332, 333, each specialized in an aspect of shape or appearance. The specular albedo and roughness maps 182,184 are grouped in one branch333 and the normal and depth maps 181,I85 are grouped in another 332 as they are closely related. Finally, a third branch 332 handles the diffuse albedo I82. All three branches 331, 332, 333 of the decoder 32 receive the same inputs from the encoder 31, but the skip connections 34 are made more flexible. In particular, two res-blocks 35, 36 and a convolution layer 37 are added to the skip connections, allowing the training process to adjust the information transferred to each decoder branch 331, 332, 333 from the encoder 31. The res-block 35, 36 on the skip connections 34 allows the network 22 to forward the most relevant information to each separate decoder branch 331, 332, 333 helping to decorrelate the diffuse response from the other parameters. This can help to preserve high-frequency features in all of the reflectance and shape maps as each of these have a different scale and dynamic range, as well as decorrelating the details in the predicted maps from each other. The network is trained on 512 x 512 images.
Polarization rendering loss The network 22 is trained using two losses, namely an Li loss to regularize the training, computing an absolute difference between the output maps and the targets, and a polarized rendering loss. The rendering loss used by Deschaintre et al. only computes losses (i.e., errors) for standard renderings based on predicted versus ground truth reflectance and shape maps. Polarized rendering loss computes losses (i.e., errors) for more sophisticated renderings that include specular and diffuse polarization simulations. Rendering losses can be efficient in training reflectance acquisition methods. These are improved by simulating the polarization behaviour of surface reflectance in a differentiable fashion, allowing gradients of rendering effects from diffuse and specular polarization to be taken into account in the training process. Acquisition procedure
Referring again to Figures l and 2A, the acquisition process involves capturing an object 2 under flash illumination with three polarization filter orientations, namely 0°, 45°, and 90°. As explained earlier, a DSLR camera 6, a tripod 10 and a linear polarizing filter 7 are used and the polarizer 7 is manually rotated on the lens 8 to acquire the data 5. However, polarization sensors, e.g., Sony (RTM) Polarsens (RTM) can be used which allow rapid capture of this information in a single shot. A small color checker 11 next to the captured object 2 is used for white balancing and HDR capture, using autoexposure bracketing on the camera, to better extract the polarization information and match the object appearance as closely as possible. The acquisition process takes around a minute.
A typical acquisition scene is illustrated in Figure 2A.
Evaluation As explained earlier, polarization imaging and flash illumination is used to recover 3D objects shape and SVBRDF. To provide comparisons, the results of Li et al. ibid, and Boss et al. ibid, are used as comparative examples since the methods described therein target similar outputs with regular photographs under flash illumination. Comparisons
Quantitative comparisons
The method herein described is quantitatively compared to Li et al. ibid, and Boss et al. ibid, using Li distance. The error on the normal maps, depth and directly on renderings are evaluated as these are not affected by the different BRDF models chosen by the different methods. This numerical evaluation is performed on 250 combinations of 6 randomly rotated meshes and 30 SVBRDF. The rendering error is computed over 20 renderings for each result with varying light properties. Table 1 below shows that the method strongly benefits from the polarization cues, white balancing and HDR imaging with significantly lower error on depth, normal and renderings. Table 1
Figure imgf000022_0001
The method herein described and those of Li et al. ibid, and Boss et al. ibid, are evaluated using the synthetic test set. The normal error is reported in degrees, while the rest is reported as Li distance. For all parameters, a lower value is better. 20 renderings are compared with different illumination for each result rather than the parameters maps as the material model used by these methods vary. The method can be seen are leveraging white balance, HDR inputs and polarization cues, producing significantly better results on the complex shapes Qualitative comparisons
For qualitative comparison, the method herein described is evaluated against Li et al. ibid, and Boss et al. ibid, on synthetic data and on real data, i.e., ground truth (or "GT").
Figure 9 shows a comparison based on synthetic test data. By leveraging polarization information, the method produces more plausible results and better captures the appearance of the input. While the re-renderings (far right column) and shape can be directly compared, the BRDF parameters maps are provided for qualitative evaluation as different BRDF models are used by the different methods. The inputs are adapted to each method and the published codes for Li et al. ibid, and Boss et al. ibid, are used to generate results.
Due to the polarization cues, the method captures the global 3D shape of the object much better than single-image methods. An important distinction over each of these is that the method does not correlate the SVBRDF variation in the input to normal variation in the output as the Stokes map disambiguate this information.
Figure 10 shows results on real objects. The method better recovers the global shape of the object as well as its appearance showing that it generalizes well to real acquisition. This is particularly seen in the rendering under a new flash lighting direction where the results using the method demonstrate appropriate shading variation due to the estimated surface normal and reflectance maps. Ablation study
Components are evaluated by removing them one at a time. The error is quantitatively evaluated and reported in Table 2 below.
Table 2
Figure imgf000023_0001
The contribution of the different technical components computed over the test set is evaluated. For each column, training was performed without the component, namely (a) improved skip connections, (b) polarized rendering loss and (c) polarization cues. The normal error is reported in degrees, while the rest are reported as an Li distance. For all parameters, a lower value is better. The use of both improved skip connections and polarized rendering loss improve results, but most importantly the polarization cues significantly improve the results on all recovered properties.
Improved skip connections
The first column of Table 2 evaluates the method with standard skip connections. The res-block 35, 36 (Figure 8) on the skip connections allows the network 22 (Figure 8) to forward the most relevant information to each separate decoder branch 33 , 332, 333 (Figure 8) helping to decorrelate diffuse response from the other parameters. Such a correlation effect is visible in Figure 9 in Li et al.'s result, for example. Polarized rendering loss
The second column of Table 2 evaluates the method with a rendering loss similar to V. Deschaintre et al. ibid. The differentiable polarized renderings that are implemented help the network to better separate the diffuse and specular signal -with small improvement in the roughness and specular, but mostly in de-lighting the diffuse albedo. Polarization cues
The third column of Table 2 evaluates the method with a single HDR, white balanced flash input without any polarization information. All the recovered parameters significantly suffer from the absence of polarization cues. It is found that the single image method rendering error to be lower than compared methods, which can be attributed to the use of a white balanced, HDR input and training on complex meshes, helping to recover the global curvature.
Limitations The method is currently limited to flash illumination where the polarization signal is dominated by diffuse polarization. The more general case of acquisition in arbitrary environmental illumination including outdoor illumination is more challenging due to the potentially complex mixing of specular and diffuse polarization signal. Referring to Figures 11A and 11B, in experiments, it was found that this can result in inconsistent cues with strong discontinuities in the Stokes map as shown in Figure 11A. This inconsistency comes from the different light sources and inter-reflection composing the illumination on a 3D object in the wild. Interesting information can be retrieved in some cases where specular polarization dominates providing a cleaner signal similar to the flash illumination case, as shown in Figure 11B. Thus, whereas flash illumination arrangement can be used to exploit a clean signal of diffuse polarization for shape and reflectance estimation, other illumination arrangements such as an inwardly-directed circular or spherical array of light panels, for example, as described in US 2021/05015 Ai, can be used to exploit a clean signal of specular polarization for shape and reflectance estimation. In the latter case, the deep network needs to be trained accordingly on specular polarization cues.
In principle there is a limitation to acquiring dielectric objects as the information extracted through polarization cues is valid for dielectrics. Metals polarize light elliptically. The dielectric assumption can still hold in practice for some metallic surfaces in the real world (metal-dielectric composite, weathering effects), and the acquisition approach should apply in such cases. The method is able to provide high quality estimate of surface normal and depth, as well as specular roughness. However, the diffuse albedo estimates, in some cases, have a few specular highlights baked-in due to saturation of the flash illumination during data capture (image in-painting can help in these saturated pixels). Supplementary material
Background Stokes parameters The Stokes parameters are a set of values describe the polarization state of light in terms of its total intensity
Figure imgf000025_0001
. DOP (degree of polarization,
Figure imgf000025_0002
) and the shape parameters of the polarization ellipse. Stokes parameters consist of four vectors:
Figure imgf000025_0003
where s0 is the total intensity of the light, s1 and s2 are the intensity of 0° and +45º polarization respectively, and s3 is the intensity of right circular polarization. Here
Figure imgf000025_0005
and
Figure imgf000025_0006
are the spherical coordinates of the three-dimensional vector of cartesian coordinates [s1, s2, s3]
Mueller Calculus Upon reflection, the incident polarization state of light is altered based on the following
Mueller calculus:
Figure imgf000025_0004
where is and sref are Stokes vectors of the incident light and reflected light respectively,
Figure imgf000025_0011
is the Mueller matrix of rotation which rotates the incident Stokes vector in the global frame (same as the camera frame in our case) into the canonical frame of reference (plane of incidence),
Figure imgf000025_0007
is the concatenation of the Mueller reflection matrix and a linear retarder of phase δ . The
Figure imgf000025_0008
term rotates the result back to the camera frame, hence the
Figure imgf000025_0009
angle.
Mueller Rotation Matrix
The concatenation of the Mueller matrices of a linear di-attenuator
Figure imgf000025_0010
calculates the Stokes vectors of light upon reflection off the surface, in the local plane of incidence frame. However, the initial Stokes vectors are defined in the global frame and therefore the Mueller rotation Matrix is required to align these two frames:
Figure imgf000026_0001
where Φ is the angle between the y direction of the right-hand global frame and the normal π of the surface.
Reflection and Transmission
An optical reflector which alters the polarization state of the incident light beam upon reflection can be described as a concatenation of the Mueller reflection matrix and a linear retarder of phase δ:
Figure imgf000026_0002
where and
Figure imgf000026_0004
are parallel and perpendicular specular reflectance coefficients as calculated by Fresnel equations, and δ is the relevant phase between the parallel and perpendicular polarized components. The phase shift δ is a step function for dielectric material: δ = π for any incidence angle before the Brewster angle δ = o otherwise
In case of diffuse polarization, specular reflectance coefficients are replaced by transmission coefficients:
Figure imgf000026_0003
and the refractive index of the material that light is incident on becomes 1 /n2 as the light gets scattered and comes out from the material.
Synthetic Data Generation According to G. Atkinson and E. Hancock: "Recovery of surface orientation from diffuse polarization", IEEE Transactions on Image Processing, volume 15, pp. 1653-1664 (2006) ("Atkinson & Hancock"), the degree of polarization (DOP) can be calculated as:
Figure imgf000027_0001
Although equation A5 gives the correct diffuse polarization orientation in renderings compared to real measurements, the DOP however does not match actual observations. The observed diffuse DOP can go up to approximately 10% at an incidence angle of roughly 85° for common dielectric materials. In contrast, Atkinson & Hancock ibid. report the diffuse DOP as reaching roughly 25% for materials with an index of refraction (IOR) 1.4 at an 85° admittance angle.
In practice, due to a small amount of specular reflection with an opposite polarization orientation to the diffuse reflection, diffuse DOP is slightly reduced explaining the 10% observed.
Figures 12A and 12B show measured diffuse DOP (y-axis) for different angles of incidence (x-axis: in radians) on two spherical balls. Figure 12C shows simulated diffuse DOP using a polynomial fit to measured data.
To better simulate real world diffuse polarization, the diffuse polarization is rendered based on equation A5, with the following approximations: - The diffuse polarization calculated for θ from range
Figure imgf000027_0003
is stretched and mapped to range o - π. This is due to a mirror reflection assumption in equation A5 which yields a total internal reflection within the range
Figure imgf000027_0002
- A polynomial function is further applied to the diffuse intensity
Figure imgf000027_0004
- When calculating the θ angle for specular polarization, a half vector
Figure imgf000027_0005
of the light direction
Figure imgf000027_0006
and view direction is used to replace the normal vector Deep network Architecture
Referring again to Figure 8, the architecture is based on U-Net 29 with a joint encoder 31 with 9 convolutions with stride 2 and kernel size 4. Between each layer a Leaky Relu
(a = 0.2) activation function and Instance Normalization is used. Global statistics are maintained by using the Global feature secondary track 30. Reference is made to V. Deschaintre et al. ibid. The decoder 32 is split into three branches 331, 332, 333 specialized in different aspect of appearance. The branches 331, 332, 333 respectively output (1) depth and normal I85, 181, (2) diffuse albedo I82 and (3) roughness and specular albedo 183, 184. Each branch 331, 332, 333 is symmetric to the encoder 32 with 9 deconvolutions. Between each layer a Leaky Relu (a = 0.2) activation function is also used. Each deconvolution is composed of a 2x upsampling and two 3x3 convolutions with stride 1.
The encoder 31 is connected to the decoder branches through skip connections 34 to propagate high frequency details. Two residual blocks 35, 36 and a 3x3 convolution are added to each skip connection 34 allowing the network 22 to learn which information is most relevant to each decoder branch 331, 332, 333. More than two residual blocks can be used. Each residual block 35, 36 is composed of two 3x3 convolutional layers with stride 1 and Relu activation functions.
Training The network 22 was trained for 5 days (1,000,000 steps) on a GPU, in particular, a single Nvidia RTX 2080 TI. A batch size of 2 and a learning rate of 0:00002 were used. The network is fully convolutional and trained on 512 x 512 images.
The loss function uses a distance between the parameter maps for regularization with a weight of 0.25 and a polarized rendering loss, computing four polarization angles for three different lighting conditions with a weight of 1.0. The distance is measured between parameters with a Li distance except for the normal map for which a cosine distance is used. Illumination
As explained earlier, images can generally be acquired under three scenarios:
Images can be acquired using frontal flash in which case diffuse polarization dominates and the Stokes map is based on diffuse polarization. Diffuse polarization is independent of the polarization state of incoming illumination. Thus, flash light can be unpolarized, linearly polarized or even circular polarized.
Referring to Figures 13A to 13D, 14 and 15, polarization of flash does not change the measured Stokes map and the deep network could be trained with a training data simulating the unpolarized or polarized state of flash illumination and/or with real measurements under such illumination for real data. Figures 13A to 13D illustrates simulation of a linearly-polarized frontal flash on a sphere with tiled green stone material and Figure 15 shows the resulting Stokes map which is dominated by diffuse polarization which is independent of the polarization state of the flash.
Images can be acquired using uniform surrounding illumination (for example, spherical or hemispherical) in which case specular polarization dominates. In this case too, a very similar Stokes maps can be obtained using unpolarized or circularly polarized illumination. The Stokes map due to specular polarization is a rotated version of the Stokes map due to diffuse polarization. Thus, the deep network could be trained with a training data simulating the unpolarized or circularly polarized state of uniform surrounding spherical/ hemispherical illumination and/or with similar real measured data.
The main difference between flash illumination and surrounding illumination is that, with uniform surrounding illumination, if the incident illumination is linearly polarized in a specific orientation, then the resulting Stokes map may not be a good cue for surface shape (unless the object is planar) and so may be sub-optimal for shape cue. On the other hand, linearly-polarized illumination can provide very good reflectance cue for diffuse and specular albedo.
Referring again to Figure 7A, 7B and 7C, a special case is planar objects where uniform illumination on a planar object can be achieved using just an area-light source or light panel or display panel that is sufficiently large, or locally uniform environmental illumination. As shown in Figure 7A, 7B and 7C, for such planar objects, an area- light/panel illumination or environmental illumination can be incident from the front (near normal incidence) or obliquely at near Brewster angle of incidence. Specular polarization dominates in the resulting surface reflectance of a planar object, and similar to the surrounding illumination case, the deep network could be trained with a training data simulating unpolarized or circularly polarized state of uniform illumination from an area- light/light-panel or environment and/or with similar real measured data. Furthermore, for the case of planar objects, the deep network could also be trained with a training data simulating linearly polarized state of uniform illumination from an area- light/light-panel or environment and/or with similar real measured data.
Applications
Image capture for shape and spatially varying reflectance estimation here described can be used to render images used in computer graphics applications such as visualization, visual effects, augmented reality, virtual reality, computer games and e- commerce.
Modifications
It will be appreciated that various modifications may be made to the embodiments hereinbefore described. Such modifications may involve equivalent and other features which are already known in the design, manufacture and use of systems for acquiring shape and spatially-varying reflectance of objects, and component parts thereof and which may be used instead of or in addition to features already described herein. Features of one embodiment may be replaced or supplemented by features of another embodiment.
The object maybe a plant, animal or human ( e.g ., the whole body) or a part of a plant, animal or human (such as a face or hand). The object maybe an inanimate object or part of an inanimate object. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present invention also includes any novel features or any novel combination of features disclosed herein either explicitly or implicitly or any generalization thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention. The applicants hereby give notice that new claims may be formulated to such features and/ or combinations of such features during the prosecution of the present application or of any further application derived therefrom.

Claims

Claims
1. A method, comprising: receiving a set of at least three images of an object including at least two linearly- polarized images and at least one color image, wherein the three images have the same view of the object and are acquired under the same illumination condition in which either diffuse polarization or specular polarization dominates in surface reflectance, wherein a set of Stokes parameters s0, s1 and s2 is determinable from the at least three images; generating three-dimensional shape and spatially-varying reflectance of the object from the set of at least three images using a deep neural network trained with a plurality of sets of training images, each of the plurality of sets of training images including at least three training images including at least two linearly-polarized training images and at least one color image from which a respective set of Stokes parameters s0, s1 and s2 is determinable; and storing said three-dimensional shape and spatially-varying reflectance generated by the deep neural network.
2. The method of claim l, further comprising: receiving a polarization shape map generated from the Stokes parameters s1 and s2 for the object, and/or a colour map and/or a degree of polarization (DOP) map; wherein the three-dimensional shape and spatially-varying reflectance is generated from the set of at least three images and the polarization shape map and/or the colour map and/or the DOP map.
3. The method of claim l, further comprising: generating a polarization shape map from the Stokes parameters s1 and s2 for the object and/or a colour map and/or a degree of polarization (DOP) map using the set of at least three images; wherein the three-dimensional shape and spatially-varying reflectance is generated from the set of at least three images and the polarization shape map and/or the colour map and/or the DOP map.
4. The method of claim 2 or 3, wherein the color map is a diffuse color map.
5. The method of any one of claims 2 to 4, wherein the polarization shape map is a normalised Stokes map or an angle of polarization map.
6. The method of any one of claims 1 to 5, wherein the plurality of sets of training images comprises a plurality of sets of synthesized training images.
7. The method of any one of claims 1 to 6, wherein the plurality of sets of training images comprises a plurality of sets of measured training images.
8. The method of any one of claims 1 to 7, wherein the at least three images of the object comprise three linearly-polarized color images.
9. The method of any one of claims 1 to 8, wherein the three-dimensional shape comprises: a surface normal map, and/ or a depth map.
10. The method of any one of claims 1 to 9, wherein the spatially-varying reflectance comprises: a diffuse albedo map, and a specular albedo map, and/or a specular roughness map.
11. The method of any one of claims 1 to 10, wherein the deep neural network comprises a convolutional neural network having an encoder and a decoder and skip connections between the encoder and decoder.
12. The method of claim 11, wherein the decoder is a branched decoder comprising of at least two branches.
13. The method of claim 11 or 12, wherein the skip connections include at least one residual block or a series of at least two residual blocks.
14. The method of any one of claims 1 to 13, wherein the deep neural network is trained by considering rendering losses that include polarized rendering loss over simulated linearly polarized images.
15. The method of any one of claims 1 to 14, wherein the set of at least three images are acquired using frontal flash illumination incident on the object so as to cause diffuse polarization to dominate in the surface reflectance.
16. The method of claim 15, wherein the frontal flash illumination is unpolarized.
17. The method of claim 15, wherein the frontal flash illumination is linearly- polarized or circularly-polarized.
18. The method of any one of claims 1 to 14, wherein the set of at least three images are acquired using uniform illumination disposed around and directed at the object so as to cause specular polarization to dominate in the surface reflectance.
19. The method of claim 18, wherein the uniform illumination is unpolarized or circularly-polarized.
20. The method of claim 18, wherein the object is a planar object and wherein uniform illumination is linearly-polarized.
21. The method of any one of claims 18 to 20, wherein the uniform illumination comprises: one or more light sources and, optional, one or more reflecting surfaces arranged around the object to provide uniform illumination on the object, optionally wherein the one or more light sources comprise a plurality of light sources arranged to substantially cover a hemisphere or sphere of directions around the object.
22. A computer program comprising instructions which, when executed by at least one processor, causes the at least one processor to perform the method of any one of claims 1 to 21.
23. A computer program product comprising a computer-readable medium storing a computer program comprising instructions which, when executed by at least one processor, causes the at least one processor to perform the method of any one of claims 1 to 21, optionally the computer- readable medium may be non-transitory.
24. A device, comprising: at least one processor; and storage; the at least one processor configured: in response to receiving a set of at least three images of an object including at least two linearly-polarized images and at least one color image, wherein the three images have the same view of the object and are acquired under the same illumination condition in which either diffuse polarization or specular polarization dominates in surface reflectance, wherein a set of Stokes parameters s0, s1 and s2 is determinable from the at least three images, to generate three-dimensional shape and spatially- varying reflectance of the object from the set of at least three images using a deep neural network trained with a plurality of sets of training images, each of the plurality of sets of training images including at least three training images including at least two linearly-polarized training images and at least one color training image from which a respective set of Stokes parameters s0, s1 and s2 is determinable and to store said three- dimensional shape and spatially-varying reflectance generated by the deep neural network in the storage.
PCT/GB2022/050470 2021-02-22 2022-02-21 Image processing WO2022175688A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB2102482.3A GB2603951A (en) 2021-02-22 2021-02-22 Image Processing
GB2102482.3 2021-02-22
US17/224,401 2021-04-07
US17/224,401 US11410378B1 (en) 2021-02-22 2021-04-07 Image processing for generating three-dimensional shape and spatially-varying reflectance of the object using a deep neural network

Publications (1)

Publication Number Publication Date
WO2022175688A1 true WO2022175688A1 (en) 2022-08-25

Family

ID=80447735

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2022/050470 WO2022175688A1 (en) 2021-02-22 2022-02-21 Image processing

Country Status (1)

Country Link
WO (1) WO2022175688A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115265786A (en) * 2022-09-30 2022-11-01 长春理工大学 Strong light polarization detection device capable of automatically adjusting exposure value and detection method thereof

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
A. GHOSH ET AL.: "Circularly polarized spherical illumination reflectometry", ACM TRANS. GRAPH. (PROC. SIGGRAPH ASIA, vol. 29, 2010, pages 1 - 12
A. KADAMBI ET AL.: "Polarized 3D: High-quality depth sensing with polarization cues", PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2015, pages 3370 - 3378, XP032866690, DOI: 10.1109/ICCV.2015.385
G. ATKINSONE. HANCOCK: "Recovery of surface orientation from diffuse polarization", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 15, pages 1653 - 1664, XP002604501, DOI: 10.1109/TIP.2006.871114
G. C. GUARNERA ET AL.: "Estimating surface normals from spherical stokes reflectance fields", ECCV WORKSHOP ON COLOR AND PHOTOMETRY IN COMPUTER VISION, 2012, pages 340 - 349, XP047033461, DOI: 10.1007/978-3-642-33868-7_34
J. RIVIERE ET AL.: "Polarization imaging reflectometry in the wild", ACM TRANSACTIONS ON GRAPHICS, vol. 36, no. 6, 2017, XP058473842, DOI: 10.1145/3130800.3130894
M. BOSS ET AL.: "Two-shot spatially-varying brdf and shape estimation", IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 2020
RIVIERE JÉRÉMY ET AL: "Polarization imaging reflectometry in the wild", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 36, no. 6, 20 November 2017 (2017-11-20), pages 1 - 14, XP058473842, ISSN: 0730-0301, DOI: 10.1145/3130800.3130894 *
V. DESCHAINTRE ET AL.: "Guided fine-tuning for large-scale material transfer", PROCEEDINGS OF THE EUROGRAPHICS SYMPOSIUM ON RENDERING, vol. 39, no. 4, 2020
V. DESCHAINTRE ET AL.: "Single-Image SVBRDF Capture with a Rendering-Aware Deep Network", ACM TRANSACTIONS ON GRAPHICS, vol. 37, no. 4, 2018, XP002804630 *
V. DESCHAINTRE ET AL.: "Single-Image SVBRDF Capture with a Rendering-Aware Deep Network", ACM TRANSACTIONS ON GRAPHICS, vol. 37, no. 4, 2018, XP058634470, DOI: 10.1145/3197517.3201378
Y. BA ET AL.: "Deep shape from polarization", EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV, 2020
Z. LI ET AL.: "Learning to reconstruct shape and spatially-varying reflectance from a single image", ACM TRANSACTIONS ON GRAPHICS, vol. 37, no. 6, 2018, XP058464764, DOI: 10.1145/3272127.3275055

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115265786A (en) * 2022-09-30 2022-11-01 长春理工大学 Strong light polarization detection device capable of automatically adjusting exposure value and detection method thereof

Similar Documents

Publication Publication Date Title
US11354840B2 (en) Three dimensional acquisition and rendering
Wang et al. Learning indoor inverse rendering with 3d spatially-varying lighting
CN104952063B (en) For indicating the method and system of virtual objects in the view of true environment
Wu et al. Fusing multiview and photometric stereo for 3d reconstruction under uncalibrated illumination
Nowrouzezahrai et al. Light factorization for mixed-frequency shadows in augmented reality
KR100681320B1 (en) Method for modelling three dimensional shape of objects using level set solutions on partial difference equation derived from helmholtz reciprocity condition
EP3382645B1 (en) Method for generation of a 3d model based on structure from motion and photometric stereo of 2d sparse images
Gruber et al. Image-space illumination for augmented reality in dynamic environments
Peng et al. Depth super-resolution meets uncalibrated photometric stereo
Hwang et al. Sparse ellipsometry: portable acquisition of polarimetric SVBRDF and shape with unstructured flash photography
GB2614799A (en) Techniques for multi-view neural object modeling
US11410378B1 (en) Image processing for generating three-dimensional shape and spatially-varying reflectance of the object using a deep neural network
WO2022175688A1 (en) Image processing
Tozza et al. Uncalibrated, two source photo-polarimetric stereo
Walton et al. Synthesis of environment maps for mixed reality
Liu et al. Inverse rendering and relighting from multiple color plus depth images
CN109446945A (en) Threedimensional model treating method and apparatus, electronic equipment, computer readable storage medium
Lu et al. Mirror surface reconstruction using polarization field
CN109238167B (en) Transparent object light corresponding relation acquisition system
Lo et al. Acquiring 360° Light Field by a Moving Dual-Fisheye Camera
Rodriguez et al. Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization
Liu et al. Albedo assisted high-quality shape recovery from 4D light fields
Cao et al. Shape and albedo recovery by your phone using stereoscopic flash and no-flash photography
US11216646B1 (en) Techniques for object appearance capture
Alhakamy et al. Polarization-based illumination detection for coherent augmented reality scene rendering in dynamic environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22705872

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22705872

Country of ref document: EP

Kind code of ref document: A1