WO2020181360A1 - Layered scene decomposition codec system and methods - Google Patents

Layered scene decomposition codec system and methods Download PDF

Info

Publication number
WO2020181360A1
WO2020181360A1 PCT/CA2020/050228 CA2020050228W WO2020181360A1 WO 2020181360 A1 WO2020181360 A1 WO 2020181360A1 CA 2020050228 W CA2020050228 W CA 2020050228W WO 2020181360 A1 WO2020181360 A1 WO 2020181360A1
Authority
WO
WIPO (PCT)
Prior art keywords
light field
layers
scene
data set
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CA2020/050228
Other languages
English (en)
French (fr)
Inventor
Matthew Hamilton
Chuck Rumbolt
Donovan BENOIT
Matthew Troke
Robert Lockyer
Thomas Butyn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avalon Holographics Inc
Original Assignee
Avalon Holographics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Avalon Holographics Inc filed Critical Avalon Holographics Inc
Priority to CN202080016205.2A priority Critical patent/CN113748682B/zh
Priority to KR1020217028928A priority patent/KR102602719B1/ko
Priority to CN202511380095.9A priority patent/CN121509648A/zh
Priority to CA3127545A priority patent/CA3127545C/en
Priority to JP2021544756A priority patent/JP7387193B2/ja
Publication of WO2020181360A1 publication Critical patent/WO2020181360A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/00Three-dimensional [3D] image rendering
    • G06T15/06Ray-tracing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three-dimensional [3D] modelling for computer graphics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/557Depth or shape recovery from multiple images from light fields, e.g. from plenoptic cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/302Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0088Synthesising a monoscopic image signal from stereoscopic images, e.g. synthesising a panoramic or high resolution monoscopic image

Definitions

  • the present disclosure relates to image (light field) data encoding and decoding, including data compression and decompression systems and methods for the provision of interactive multi-dimensional content at a light field display.
  • Data may be compressed for various types of transmission, such as, but not limited to: long-distance transmission of data over internet or ethemet networks; or transmission of a synthetic multiple-view created by a graphical processing unit (GPU) and transferred to a display device.
  • Such data may be used for video streaming, real-time interactive gaming, or any other light field display.
  • CODECs encoder-decoders
  • Olsson et al. teach compression techniques where an entire light field data set is processed to reduce redundancy and produce a compressed representation. Subcomponents (i.e., elemental images) of the light field are treated as a video sequence to exploit redundancy using standard video coding techniques.
  • Vetro et al. teach multiple-view specializations of compression standards that exploit redundancy between the light field subcomponents to achieve better compression rates, but at the expense of more intensive processing. These techniques may not achieve a sufficient compression ratio, and when a good ratio is achieved the encoding and decoding processes are beyond real-time rates.
  • These approaches assume that the entire light field exists in a storage disk or memory before being encoded. Therefore, large light field displays requiring large numbers of pixels introduce excessive latency when reading from a storage medium.
  • US Patent No. 9727970 discloses a distributed, in parallel (multi-processor) computing method and apparatus for generating a hologram by separating 3D image data into data groups, calculating from the data groups hologram values to be displayed at different positions on the holographic plane and summing the values for each position for generating a holographic display.
  • the strategies applied involve manipulating light at a smaller scale than light field and in this instance is characterized by the sorting and dividing of data according to colour, followed by colour image planes and then further dividing the plane images into sub- images.
  • US Patent Publication No. 20170142427 describes content adaptive light field compression based on the collapsing of multiple elemental images (hogels) into a single hogel. The disclosure describes achieving a guaranteed compression rate, however, image lossiness varies and in combining hogels as disclosed there is no guarantee of redundancy that can be exploited.
  • US Patent Publication No. 20160360177 describes methods for full parallax compressed light field synthesis utilizing depth information and relates to the application of view synthesis methods for creating a light field from a set of elemental images that form a subset of a total set of elemental images. The view synthesis techniques described herein do not describe or give methods to handle reconstruction artifacts caused during backwards warping.
  • US Patent Publication No. 20150201176 describes methods for full parallax compressed light field 3D imaging systems disclosing the subsampling of elemental images in a light field based on the distance of the objects in the scene being captured. Though the methods describe the possibility of down sampling the light field using simple conditions that could enhance the speed of encoding, in the worst case 3D scenes exist where no down-sampling would occur, and the encoding would fall back on transform encoding techniques which rely on having the entire light field to exist prior to encoding.
  • the present invention relates generally to 3D image data encoding and decoding for driving a light field display in real-time, which overcomes or can be implemented with present hardware limitations.
  • Light field or 3D scene data is deconstructed into subsets, which may be referred to as layers (corresponding to layered light fields), or data layers, sampled and rendered to compress the data for transmission and then decoded to construct and merge light fields corresponding to the data layers at a light field display.
  • a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene; partitioning the first data set into a plurality of layers each representing a different portion of the scene at a different location with respect to a reference location; partitioning data corresponding to at least one of the layers into a plurality of subsections, wherein a location of a particular subsection is determined in accordance with geometry of at least a portion of an object represented within the scene; and encoding multiple layers and multiple subsections to generate a second data set [014]
  • a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene, the first data set comprising information on directions of normals on surfaces in the scene, the directions of the normals represented with respect to a reference direction, wherein at least some of the surfaces have non-Lambertian reflection properties; partitioning the first data set into a plurality of layers, each layer representing
  • a light field image rendering method comprising of the steps of: partitioning a three-dimensional surface description of a scene into layers, each layer having an associated light field and sampling scheme; further partitioning at least one layer into a plurality of subsections, each subsection having an associated light field and sampling, wherein a location of a particular subsection is determined in accordance with geometry of at least a portion of an object represented within the scene; rendering a first set of pixels, comprising extra-pixel information, for each layer and each subsection in accordance with the sampling scheme and corresponds to a sampled light field; reconstructing the sampled light field for each layer and subsection using the first set of pixels; and merging the reconstructed light fields into a single output light field image.
  • a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene; partitioning the first data set into a plurality of layers each representing a portion of the scene at a location with respect to a reference location; obtaining, for each of the plurality of layers, one or more polygons representative of corresponding portions of objects in the scene; determining, based on the one or more polygons, a view-independent representation; and encoding the view-independent representation as a portion of a second data set, wherein a size of the second data set is smaller than a size of the first data set.
  • a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene; partitioning the first data set into a plurality of layers each representing a portion of the scene at a location with respect to a reference location; and encoding multiple layers to generate a second data set by performing a sampling operation on the layers comprising: using an effective resolution function to determine a suitable sampling rate; and downsampling elemental images associated with a layer using the suitable sampling rate, wherein a size of the second data set is smaller than a size of the first data set.
  • a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene, the first data set comprising information on transparency of surfaces in the scene; partitioning the first data set into a plurality of layers each representing a portion of the scene at a location with respect to a reference location; and encoding multiple layers to generate a second data set, wherein a size of the second data set is smaller than a size of the first data set.
  • a light field image rendering method comprising of the steps of: partitioning a three-dimensional surface description of a scene into layers, each layer having an associated light field and sampling scheme; further partitioning at least one layer into a plurality of subsections, each subsection having an associated light field and sampling, wherein a location of a particular subsection is determined in accordance with geometry of at least a portion of an object represented within the scene; rendering a first set of pixels, comprising extra-pixel information, for each layer and each subsection in accordance with the sampling scheme and corresponds to a sampled light field; reconstructing the sampled light field for each layer and subsection using the first set of pixels; and merging the reconstructed light fields into a single output light field image.
  • Embodiments can include one or more of the following features.
  • the second data set is transmitted to a remote device for the scene to be presented at a display device associated with the remote device.
  • encoding a layer or subsection comprises performing a sampling operation on a corresponding portion of the first data set.
  • the sampling operation is based on a target compression rate associated with the second data set.
  • encoding multiple layers or multiple subsections comprises performing a sampling operation on a corresponding portion of the first data set wherein performing the sampling operation comprises: rendering using ray tracing, a set of pixels to be encoded; selecting multiple elemental images from a plurality of elemental images such that the set of pixels are rendered using the selected multiple elemental images; and sampling the set of pixels using a sampling operation.
  • the sampling operation comprises selecting multiple elemental images, from a corresponding portion of the plurality of elemental images, in accordance with a plenoptic sampling scheme.
  • performing the sampling operation comprises: determining an effective spatial resolution associated with the layer or subsection;
  • the angular resolution is determined as a function of a directional resolution associated with the portion of the scene associated with the layer or subsection.
  • the angular resolution is determined as a field of view associated with a display device.
  • the three-dimensional description comprises light field data representing a plurality of elemental images.
  • each of the plurality of elemental images is captured by one or more image acquisition devices.
  • the first data set comprises information on directions of normals on surfaces included in the scene, the directions of the normal being represented with respect to a reference direction.
  • reflection properties of at least some of the surfaces are non-Lambertian.
  • encoding a layer or a subsection further comprises:
  • layers located closer to the display surface achieve a lower compression ratio than layers of the same width located further away from the display surface.
  • the multiple layers of the second data set comprise light fields. [038] In an embodiment of the method, further comprising the merging of the light fields to create a final light field.
  • the partitioning of the layers comprises restricting the depth range of each layer.
  • the layers located closer to the display surface are narrower in width than layers located farther away from the display surface.
  • the partitioning of the first data set into a plurality of layers maintains a uniform compression rate across the scene.
  • the partitioning of the first data set into a plurality of layers comprises partitioning the light field display into inner and outer frustum volume layer sets.
  • the method is used to used to generate a synthetic light field for multi-dimensional video streaming, multi-dimensional interactive gaming, real-time interactive content, or other light field display scenarios.
  • the synthetic light field is generated only in a valid viewing zone.
  • a computer method for rendering a light field image comprising: partitioning a three-dimensional surface description of a scene into layers, each layer having an associated light field and sampling scheme; further partitioning at least one layer into a plurality of subsections, each subsection having an associated light field and sampling, wherein a location of a particular subsection is determined in accordance with geometry of at least a portion of an object represented within the scene; rendering a first set of pixels, comprising extra-pixel information, for each layer and each subsection in accordance with the sampling scheme and corresponds to a sampled light field; reconstructing the sampled light field for each layer and subsection using the first set of pixels; and merging the reconstructed light fields into a single output light field image.
  • the first set of pixels and associated extra-pixel information is partitioned into subsets, whereby reconstruction sampled light fields for each layer and merging are performed using pixels from a single subset in a cache to create some subset of the output light field image.
  • further comprising reconstructing the sampled light field for each layer is performed by re-projecting pixels in the first set from a cache to create some subset of the output light field image
  • further comprising re-projecting pixels is performed using a warping process along a single dimension in the first set of pixels followed by a second warping process in a second dimension in the first set of pixels.
  • Figure 1 is a schematic representation (block diagram) of an embodiment of a layered scene decomposition (CODEC) system according to the present disclosure.
  • Figure 2 is a schematic top-down view of the inner frustum volume and outer frustum volume of a light field display.
  • Figure 3A illustrates schematically the application of edge adaptive interpolation for pixel reconstruction according to the present disclosure.
  • Figure 3B illustrates a process flow for reconstructing a pixel array.
  • Figure 4 illustrates schematically elemental images specified by a sampling scheme within a pixel matrix, as part of the image (pixel) reconstruction process according to the present disclosure.
  • Figure 5 illustrates schematically a column-wise reconstruction of a pixel matrix, as part of the image (pixel) reconstruction process according to the present disclosure.
  • Figure 6 illustrates a subsequent row-wise reconstruction of the pixel matrix, as part of the image (pixel) reconstruction process according to the present disclosure.
  • Figure 7 illustrates schematically an exemplary CODEC system embodiment according to the present disclosure.
  • Figure 8 illustrates schematically an exemplary layered scene decomposition of an image data set (a layering scheme of ten layers) correlating to the inner frustum light field of a display.
  • Figure 9 illustrates schematically an exemplary layered scene decomposition of image data (two layering schemes of ten layers) correlating to the inner frustum and outer frustum light field regions, respectively, of a display.
  • Figure 10 illustrates an exemplary CODEC process flow according to the present disclosure.
  • Figure 11 illustrates an exemplary process flow for encoding 3D image (scene) data to produce layered and compressed core encoded (light field) representations, according to the present disclosure.
  • Figure 12 illustrates an exemplary process flow for decoding core encoded representations to construct a (display) light field at a display, according to the present disclosure.
  • Figure 13 illustrates an exemplary process flow for encoding and decoding residue image data for use with core image data to produce a (display/fmal) light field at a display according to the present disclosure.
  • Figure 14 illustrates an exemplary CODEC process flow including layered depth images, according to the present disclosure.
  • Figure 15 illustrates an exemplary CODEC process flow including specular light calculation, according to the present disclosure.
  • Figure 16 illustrates an alternate exemplary CODEC process flow including specular light calculation, according to the present disclosure.
  • Figure 17 illustrates an exemplary CODEC process flow including view independent rasterization, according to the present disclosure.
  • Figure 18 illustrates an exemplary CODEC process flow including performing a sampling operation using an effective resolution function, according to the present disclosure.
  • Figure 19 illustrates observer-based constructed plane used to measure effective resolution at depth.
  • Figure 20 graphically illustrates the asymptotic nature of effective resolution with respect to scene depth.
  • Figure 21 illustrates an exemplary CODEC process flow including transparency, according to the present disclosure.
  • the present invention relates generally to CODEC systems and methods for light field data or multi-dimensional scene data compression and decompression to provide for the efficient (rapid) transmission and reconstruction of a light field at a light field display.
  • compositions, device, article, system, use or method denotes that additional elements and/or method steps may be present, but that these additions do not materially affect the manner in which the recited composition, device, article, system, method or use functions.
  • the term“consisting of’ when used herein in connection with a composition, device, article, system, use or method excludes the presence of additional elements and/or method steps.
  • a composition, device, article, system, use or method described herein as comprising certain elements and/or steps may also, in certain embodiments consist essentially of those elements and/or steps, and in other embodiments consist of those elements and/or steps, whether or not these embodiments are specifically referred to.
  • the term“about” refers to an approximately +/-10% variation from a given value. It is to be understood that such a variation is always included in any given value provided herein, whether or not it is specifically referred to.
  • connection refers to any direct or indirect physical association between elements or features of the present disclosure. Accordingly, these terms may be understood to denote elements or features that are partly or completely contained within one another, attached, coupled, disposed on, joined together, in communication with, operatively associated with, etc., even if there are other elements or features intervening between the elements or features described as being connected.
  • the term“light field” at a fundamental level refers to a function describing the amount of light flowing in every direction through points in space, free of occlusions. Therefore, a light field represents radiance as a function of position and direction of light in free space.
  • a light field can be synthetically generated through various rendering processes or may be captured from a light field camera or from an array of light field cameras.
  • a light field may be described most generally as a mapping between a set of points in 3D space with a corresponding set of directions onto a set or sets of energy values.
  • these energy values would be red, green, blue color intensities, or potentially other radiation wavelengths.
  • the term“light field display” is a device which reconstructs a light field from a finite number of light field radiance samples input to the device.
  • the radiance samples represent the color components red, green and blue (RGB).
  • RGB red, green and blue
  • a light field can also be understood as a mapping from a four dimensional space to a single RGB color.
  • the four dimensions include the vertical and horizontal dimensions ( x , y) of the display and two dimensions describing the directional components (u, v) of the light field.
  • a light field is defined as the function:
  • a fixed x f , y f ,F(x f , y f ⁇ , u, v) represents a two dimensional (2D) image referred to as an“elemental image”.
  • the elemental image is a directional image of the light field from the fixed x f , y f position. When a plurality of elemental images are connected side by side, the resulting image is referred to as an“integral image”.
  • the integral image can be understood as the entire light field required for the light field display.
  • the term“description of a scene” refers to a geometric description of a three-dimensional scene that can be a potential source from which a light field image or video can be rendered. This geometric description may be represented by, but is not limited to, points, quadrilaterals, and polygons.
  • the term“display surface” may refer to the set of points and directions as defined by a planar display plane and physical spacing of its individual light field hogel elements, as in a traditional 3D display. In the present disclosure, displays, as described herein, can be formed on curved surfaces, thus the set of points then would reside on the curved display surface, or any other desired display surface geometry that may be imagined. In the abstract mathematical sense, a light field may be defined and represented on any geometrical surface and may not necessarily correspond to a physical display surface with actual physical energy emission capabilities
  • the term“elemental image” represents a two dimensional (2D) image, LF (x,y, u, v), for a fixed x f y f , LF (x,y, u, v.)
  • the elemental image is a directional image of the light field from the fixed x f y f position.
  • the term“integral image” refers to a plurality of elemental images connected side by side, the resulting image therefore referred to as the“integral image”.
  • the integral image can be understood as the entire light field required for the light field display.
  • the term“layer” refers to any two parallel or non-parallel boundaries, with consistent or variable width, parallel or non-parallel to a display surface.
  • the term“pixel” refers to a light source and light emission mechanism used to create a display.
  • LSD Layered Scene Decomposition
  • a conventional display as previously known in the art consists of spatial pixels substantially evenly-spaced and organized in a two-dimensional array allowing for an idealized uniform sampling.
  • a three-dimensional display requires both spatial and angular samples. While the spatial sampling of a typical three-dimensional display remains uniform, the angular samples cannot necessarily be considered uniform in terms of the display’s footprint in angular space.
  • the angular samples also known as directional components of the light field, can be parameterized in various ways, such as the planar parameterizations taught by Gortler et. al in“The Lumigraph”.
  • the light field function is discretized in terms of position, the light field can be understood as a regularly-spaced array of planar-parameterized pinhole projectors, as taught by Chai in“Plenoptic Sampling”.
  • the elemental image LF(y f , y f , u, v) represents a two-dimensional image which may be understood as an image projected by a pinhole projector with an arbitrary ray parameterization.
  • the continuous elemental image is represented by a finite number of light field radiance samples.
  • said finite number of samples are mapped into the image plane as a regularly-spaced array (the regular spacing within the plane does not correspond to a regular spacing in the corresponding angular directional space).
  • planar parameterizations are not intended to limit the scope or spirit of the present disclosure, as the directional components of the light field can be parameterized by a variety of other arbitrary parameterizations.
  • lens distortions or other optical effects in a physically embodied pinhole projector can be modeled as distortions of the planar parameterization.
  • display components may be defined through a warping function, such as taught by Clark et al. in“ A transformation method for the reconstruction of functions from nonuniformly spaced samples
  • a warping function a(u, v) defines a distorted planar parameterization of the pinhole projector, producing arbitrary alternate angular distributions of directional rays in the light field.
  • the angular distribution of rays propagating from a light field pinhole projector is determined by the pinhole projector’s focal length / and a corresponding two dimensional warping function a(u, v).
  • An autostereoscopic light field display projecting a light field for one or more users is defined as:
  • ( M x , M y ) are the horizontal and vertical dimensions of the display's spatial resolution and ( N u , N v ) are the horizontal and vertical dimensions of the display's angular resolution components.
  • the display is an array of idealized light field projectors, with pitch D LP . focal length /, and a warping function a defining the distribution of ray directions for the light field projected by the display.
  • a light field LF(x, y, u, v ) driving a light field display D (M x , M y , N u , N v , f, a, D LP ) requires M x light field radiance samples in the x direction, M y light field radiance samples in the y direction, and N u , and N v light field radiance samples in the u and v directions.
  • each of the light field planar-parameterized pinhole projectors within the array of idealized light field pinhole projectors may have a unique warping function a, if significant microlens variations exist in a practical pinhole projector causing the angular ray distribution to vary significantly from one microlens to another microlens.
  • FIG. 1 illustrates a light field display representing objects within a volumetric region defined by these two separate viewing frusta, with the inner frustum volume (110) located behind the display surface (300) (i.e., within the display) and the outer frustum volume (210) located in front of the display surface (i.e. outside of the display).
  • various objects shown schematically as prismatic and circular shapes are located at varying depths from the display surface (300).
  • Halle et al. teach a double frustum rendering technique, where the inner frustum volume and outer frustum volume are separately rendered as two distinct light fields.
  • the inner frustum volume LF o (x, y, u, v ) and outer frustum volume LF P (x, y, u, v ) are recombined into the single light field LF(x, y, u, v) through a depth merging process.
  • the technique uses a pinhole camera rendering model to generate the individual elemental images of the light field.
  • Each elemental image i.e. each rendered planar- parameterized pinhole projector image
  • Halle et al. teach rendering a pinhole projector image at a sampling region of the light field using a standard orthoscopic camera and its conjugate pseudoscopic camera. For a pinhole camera C, the corresponding conjugate camera is denoted as C*.
  • a generalized pinhole camera based on a re- parameterization of an idealized planarly -parameterized pinhole camera is used.
  • a pinhole camera C with a focal length / has light rays defined by a parameterization created by two parallel planes.
  • Pinhole camera C captures an image I c (u, v). where (u, v) are coordinates in the ray parameterization plane.
  • the scene is typically described as collections of light sources and surfaces or volumes with various material, color and physical optical properties and various viewing camera positions.
  • This rendering computation must be performed rapidly enough to produce an interactive frame rate (e.g. at least 30Hz).
  • the rendering fidelity can be adjusted based on what degree the light transport calculations are approximated, which of course decreases the computational requirements as more approximation is used. It is for this reason that interactive computer graphics generally have a lower visual fidelity than offline rendered graphics where very high-fidelity light transport models are employed.
  • the requirement for interactivity implies a certain frame rate (typically at least 20-30Hz, but often desired to be higher) with a corresponding bandwidth but also an implication of reduced latency to support immediate graphical response to user input.
  • the high bandwidth combined with the latency requirements impose certain challenges in terms of computation.
  • GPUs graphics processing units
  • CPUs central processing units
  • the rendering problem becomes that of rendering an image produced by a virtual light field camera.
  • a light field camera (defined elsewhere in more detail), may be viewed as an array of many conventional 2D camera views. This more general camera model results in calculations with a substantially different geometric structure. The result is that the calculations do not map well into the framework of existing accelerated computer graphics hardware.
  • a rendering calculation pipeline procedure is defined. This has traditionally been based on rasterization, but ray-tracing pipelines have been standardized as well (e.g. recently DirectX ray tracing). In either case, the computational hardware architecture is tailored to the form of these pipelines and their associated required calculations, with the ultimate goal being the production of 2D images at video frame rates.
  • the first step in rendering is generally loading the scene description representation from storage in DDR memory or some other memory that is slow relative to calculation hardware clock rates.
  • One notable aspect of light field rendering is that each light field camera rendering pass can be viewed as an array of more or less conventional 2D camera rendering passes. Naively, each of these 2D cameras (hogels) must be rendered twice for the inner and outer hogel, in the method of “double frustum rendering” as suggested by Halle. The number of rays then is 2 for every direction represented by the display.
  • An alternate scheme that is evident from the existing art is to define an inner and outer far clip plane and have rays cast from the outer far clip plane, through hogels on the display surface and end at the inner far clip plane (or vice versa).
  • caching data in a smaller but faster storage can significantly alleviate DDR or slow memory bandwidth constraints, when the computation is structured in such a way that data is redundantly loaded in some kind of coherent, predictable pattern.
  • each hogel's elemental image rendering requires the same scene description in the worst-case, thus identifying a significant potential redundancy to be exploited by caching effectively.
  • Modem ray-tracing techniques for surface rendering of a single camera view of a scene are able to exploit cache coherency by the principle that coherent rays on the image plane often intersect the same geometry (or at least for the primary intersection point).
  • Rasterization from polygons onto a single imaging camera inherently exploits this same coherency, as a single polygon can be cached in hardware once loaded and all of the pixels it intersects can be calculated in a hardware accelerated rasterization process.
  • these same coherency principles can be exploited within the individual 2D camera views that make up a light field if they are rendered using ray tracing or rasterization.
  • the surface buffer can be rendered efficiently using what is effectively a conventional 2D rendering ray-tracing pipeline.
  • the surface buffer is based on the concept of a layered scene decomposition and a sampling scheme as presented herein, which specifies which pixels are to comprise the surface buffer. Based on analysis presented within this specification, it may be shown that with an appropriately chosen layered scene decomposition and sampling scheme, the resulting surface buffer can be determined to contain less pixels than the desired rendered output light field image frame, as it can be viewed as a form of a data compression scheme.
  • an appropriately chosen layered scene decomposition and sampling scheme will result in a surface buffer that will contain samples of all the surfaces areas in the scene visible from any of the hogels in the targeted light field camera view.
  • a surface buffer will contain data to enable reconstruction of the light fields associated with each layer and layer subsection. Once reconstructed, these light fields can be merged into a single light field image, representing the desired rendering output, as described elsewhere in this document.
  • the resulting surface buffer can be partitioned into smaller subsets. This partitioning can occur in such a way that each subset of the surface buffer data can be used by itself to reconstruct some portion of the resulting output light field.
  • One practical embodiment involves partitioning layers and subsections whose size is based on the DEI function, then choosing a sampling scheme that includes a small number (e.g. 4) of elemental images per partition, which are then used to reconstruct the un-sampled elemental images within the partition. If this partitioning is chosen appropriately, subsets of the surface buffer can be loaded into a faster cache memory, from which reconstruction and merging calculations can be performed without resorting to repeated loads from slower system memories.
  • an efficient method to render light field video at interactive rates can be described as starting with a 3D description of a scene, rendering a surface buffer, then rendering the final output frame by reconstructing layers and subsections from cached individual partitions of the surface buffer in order to create corresponding portions of the desired output light field image.
  • rendering is structured in this form, as opposed to applying a brute force methodology which performs light field rendering as a number of conventional 2D rendering passes, less slow memory bandwidth is required as cache memories are able to be exploited in a structured fashion by partitioning of the surface buffer.
  • the remaining challenges of the technique as disclosed by Maars et al. are related to quality and speed.
  • the implementation of view independent rasterization to layers, or subsets of a three-dimensional description of a scene may include obtaining one or more polygons representative based on the geometry of objects in the scene.
  • a view independent representation is generated based upon one or more of these polygons.
  • the generated view independent representation is encoded as a portion of a compressed second data set.
  • Figure 17 illustrates a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene
  • Piao et al. utilize a priori physical properties of a light field in order to identify redundancies in the data.
  • the redundancies are used to discard elemental images based on the observation that elemental images representing neighboring points in space contain significant overlapped information. This avoids performing computationally complex data transforms in order to identify information to discard.
  • Such methods do not utilize depth map information associated with each elemental image.
  • Graziosi et al. propose criteria to sub-sample elemental images based on simple pinhole camera coverage geometry to reduce light field redundancy.
  • the downsampling technique taught by Graziosi et al. is simpler than the complicated basis decompositions often employed in other CODEC schemes for two-dimensional image and video data. Where an object is located deep within a scene, the light field is sampled at a smaller rate. For example, when two separate pinhole cameras provide two different fields of view, there is very little difference from one elemental image to the next elemental image, and the fields of view from the two pinhole cameras overlap. While the views are subsampled based on geometric (triangle) overlap, the pixels within the views are not compressed. Because these pixels can be substantial, Graziosi et al. compress the pixels with standard two- dimensional image compression techniques.
  • Graziosi et al. equate the rendering process with the initial encoding process. Instead of producing all of the elemental images, this method only produces the number needed to reconstruct the light field while minimizing any loss of information.
  • Depth maps are included with the elemental images selected for encoding and the missing elemental images are reconstructed using well-established warping techniques associated with depth image-based rendering (DIBR).
  • DIBR depth image-based rendering
  • the selected elemental images are further compressed using methods similar to the H.264/AVC method, and the images are decompressed prior to the final DIBR-based decoding phase. While this method provides improved compression rates with reasonable signal distortion levels, no time-based performance results are presented. Such encoding and decoding cannot provide good low-latency performance for high bandwidth rates.
  • this method is limited to use for a single obj ect that is far away from the display surface; in scenes with multiple overlapping obj ects and many objects close to the display surface, the compression would be forced back to use H.264/AVC style encoding.
  • each focal length distance into the scene adds another pixel of angular resolution required to fully represent objects at the given spatial resolution of the display surface.
  • L (K 1 : K 2 , L o , L P ). partitioning the inner and outer frustum volumes of a three-dimensional display. The inner frustum is partitioned into a set of K L layers, where . Each inner frustum
  • the layer is defined by a pair of boundaries parallel to the display surface at distances and from the display surface.
  • the outer frustum is partitioned
  • Each outer frustum layer is defined by a
  • the inner and outer frustum volumes may be divided by layering schemes differing from each other and the pair of boundaries can be but also may not be parallel to the display surface.
  • Each of the layered scene decomposition layers has an associated light field (herein also referred to as a“light field layer”) based on the scene restrictions to the planar bounding regions of the layer.
  • This equation is constrained such that only the space at distance d from the light
  • a layered scene decomposition generates a light field for each layer.
  • orthoscopic cameras generate inner frustum volume light fields and pseudoscopic cameras generate outer frustum volume light fields.
  • each point of the light field has an associated depth value which indicates the distance from the generalized pinhole camera plane to the corresponding point in space imaged.
  • layer s light field are bound by the depth bounds of the layer itself.
  • a merging operation can re-combine the layered scene decomposition layer sets back into the inner and outer frustum volumes, or LF O and LF P .
  • the inner and outer frustum volume light fields are merged with the merging operator * m .
  • LF 1 (x, y, u, v) and LF 2 (x, y, u, v ), where i is defined as:
  • LF O ( x , y, u, v) and LF P ( x , y, u, v) can be recovered from the sets O LF and P LF by merging the light fields associated with the inner and outer frustum layers. For example:
  • the present disclosure provides a layered scene decomposition operation and an inverse operation which merges the data to reverse said decomposition.
  • Performing a layered scene decomposition with K layers is understood to create K times as many individual light fields.
  • the value of the layered scene decomposition is in the light fields induced by the layers; these light field layers are more suitable for downsampling than the original total light field or the inner frustum volume or outer frustum volume light fields, as the total data size required for multiple downsampled layered scene decomposition light field layers with an appropriate sampling scheme is significantly less than the size of the original light field.
  • sampling scheme S is not intended to limit or depart from the scope and spirit of the invention, as other sampling schemes, such as specifying individual sampling rates for each elemental image in the layered scene decomposition layer light fields, can be employed.
  • Relatively simple sampling schemes can provide an effective CODEC with greater sampling control; therefore the present disclosure provides a simple sampling scheme to illustrate the disclosure without limiting or departing from the scope and spirit of the invention.
  • a light field sampling scheme provided according to the present disclosure represents a light field encoding method. Given a display
  • the present disclosure provides a sampling scheme S associated with L as an M x x M y binary matrix associated with any layer l i in L O or L P and a mapping function ( l i ) to map each layer l i to a pair (n x , n y ).
  • a binary ( ⁇ 0,1 ⁇ ) entry in indicates if the elemental image
  • the elemental images in light field are sampled at a resolution of n x x n y .
  • the present disclosure also provides a layered scene decomposition light field encoding process that draws upon plenoptic sampling theory.
  • the following description pertains to the inner frustum volume L° of a layered scene decomposition L, but the outer frustum volume L p may be encoded in a similar fashion.
  • the present disclosure creates a sampling scheme S using the following equation to guide the creation of : [146]
  • DEI guides the distance between“1” entries in the M s matrix associated with each layered scene decomposition layer.
  • the following equation sets the
  • This sampling scheme using both DEI and N res to drive individual layered scene decomposition layer sampling rates, can be considered as a layered plenoptic sampling theory sampling scheme (otherwise referred to herein as“plenoptic sampling scheme”).
  • This per-layer sampling scheme provides lossless compression for fronto-parallel planar scene objects where the objects within a layer do not occlude each other.
  • the present disclosure therefore provides for the identification as“core” or “residue” information for the encoding and decoding of the light field by the CODEC.
  • the present disclosure considers the encoded, downsampled light fields associated with L and S, as well as the number of layered scene decomposition layers and the depth of said layers, as the“core” representation of a light field encoded and decoded by the CODEC.
  • any additional information transmitted along with the core (encoded) representation of the light field that may be required during the decoding process is considered as the“residue” representation of the light field to be processed by the CODEC and used together with the core representation of the light field to produce the final light field displayed.
  • K i is a natural number describing the number of residue layers required for layer l i .
  • residue layer like layered scene decomposition layers, there is a light field associated with the layer:
  • these additional layers can be free form with no further restrictions.
  • additional information that can help to deal with occlusions is represented in these residue layers.
  • One way to implement this is to have residue layers have the same sampling scheme as their parent layered scene decomposition layer, however one possible variation might be to sample the residue layers with a lower directional resolution in order to tightly control the compression rate of the LSD plus residue layer combination.
  • the residue layers may be defined as additional layers corresponding to the concept of Deep G-Buffers.
  • residue layers sit in contrast to layered scene decomposition layers in the sense that the depth ranges of each layer are not fixed by pre-decided depth divisions of the layered scene decomposition layer scheme but are based on the depth layer characteristics inherent to the geometry in the scene being represented.
  • a geometry buffer (G-buffer) is the name given to an image buffer which stores colors, normals and depth information rendered relative to a particular camera viewpoint.
  • G-buffer is the name given to an image buffer which stores colors, normals and depth information rendered relative to a particular camera viewpoint.
  • Mara et. al. proposed the idea of a deep G-buffer, rendering layered depth images in the context of global illumination calculations for computer graphics in order to capture information that would otherwise be missing.
  • normals, colors and depth value were also stored for each of the layered depth images.
  • the proposed data structure can be used to provide extra geometric information for existing screen space techniques (using the standard G-buffer) in order to improve quality of lighting calculations based on the use of the extra occluded information provided by the deep G-buffer.
  • FIG. 14 illustrates a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene
  • Predictable compression rates are required to create a real-time rendering and transmission system, together with downsampling criteria (which do not indicate achievable compression rates).
  • the following provides a compression analysis of the present disclosure’s layered scene decomposition encoding strategy.
  • the present disclosure provides a downsampling light field encoding strategy, allowing for a low-latency, real-time light field CODEC.
  • complementary sampling schemes based on plenoptic sampling theory using both DEI and N res are employed to drive individual layered scene decomposition layer sampling rates.
  • the layered scene decomposition representing the total 3D scene as a plurality of light fields, expands the scene representation by a factor of the number of layers.
  • the present disclosure further contemplates that when layer depths are chosen appropriately, compression rates can be guaranteed when combined with plenoptic sampling theory based downsampling.
  • the layer's restricted depth range provides a guaranteed compression rate for the layer's light field.
  • the achievable compression ratio from downsampling a scene completely contained within a single layer can be explained in the following theorem:
  • the compression factor term determines
  • the total scene may be optimally decomposed with the maximum depth of field in the layers.
  • layered scene decomposition layers located closer to the display surface achieve a lower compression ratio than layers of the same width located further away from the display surface.
  • layered scene decomposition layers with a narrower width are located closer to the display surface, and wider layered scene decomposition layers are located further away from the display surface; this placement maintains a uniform compression rate across the scene.
  • the consideration of this identity function is not intended to limit the scope or spirit of the present disclosure, as other functions can be utilized.
  • a single layered scene decomposition layer with a front boundary located at depth Z DOF represents the system from Z DOF to infinity.
  • Lossless compression may be defined as class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data.
  • layered scene decomposition layers beyond the deepest layer located at the light field display's maximum depth of field are not considered, as these layers do not provide additional representative power from the core representation perspective; this applies to both the inner and outer frustum volume layer sets.
  • the layered scene decomposition layers utilize maximum and minimum distance depths that are integer multiples of the light field display / value.
  • Layered scene decomposition layers with a more-narrow width provide a better per-layer compression ratios, thereby providing better overall scene compression ratios.
  • a greater number of layers in the decomposition increases the amount of processing required for decoding, as a greater number of layers must be reconstructed and merged.
  • the present disclosure accordingly teaches a layer distribution scheme with differential layer depths.
  • layered scene decomposition layers (and by correlation the light fields represented by said layers) with a more narrow width are located closer to the display surface, and the layer width (i.e., the depth difference between the front and back layer boundaries) increases exponentially as the distance from the display surface increases.
  • N res is an example of an effective resolution function. This scheme in theory will sample losslessly, if there are no occlusions and if all objects within layers are fronto-planar.
  • the model derived shows that an asymptotic resolution can be calculated for objects.
  • the asymptotic resolution decreases as a function of observer distance. Therefore, if we can assume a maximum observer distance, it is reasonable to use the corresponding asymptotic resolution or other associated resolution fall-off function as a worse-case measure of resolution degradation at depth.
  • the resolution function can be plotted, as seen in Figure 20, of various directional sampling rates less than that implied by N res ( d max ) (443) and consider how it deviates from ideal (440) within the depth range from d min (442) to d max (443).
  • N res d max
  • the deviation will become greater as a function of greater depth, but of course will not exceed the maximum based on the asymptotic value (441).
  • the deviation represents signal loss; however, it can be quantified and has a bound based on the asymptotic value.
  • a three-dimensional description of a scene is partitioned into a plurality of subsets and multiple layers or subsets are encoded to generate a second data set, smaller than a size of the first data set.
  • the encoding of the layer or subset may include performing a sampling operation on the subset.
  • An effective resolution function is used to determine a suitable sampling rate. Elemental images associated with a subsection are then downsampled using the determined suitable sampling rate.
  • Dodgson analyzes how observers can occupy various viewing zones in front of a 3D display which correspond to density of projection from the display's angular components, but do not directly relate these viewing zones to apparent viewing quality of objects at depth.
  • DoF One approach to dealing with small depth of field, or DoF, involves scaling content to fit within the target region. This technique does appear to produce a good result, but since it involves some optimization on the content, it appears it would not be immediately amenable to real-time datasets in an interactive setting. A simpler, fixed scheme re-scaling technique could work in a real-time setting yet would likely introduce unacceptable distortion artifacts such as cardboarding.
  • Cardboarding may be defined as a pervasive artifact that occurs when visualizing 3D content is the so-called "cardboarding" effect, where objects appear flat due to depth compression.
  • an observer is a pinhole camera.
  • a real human eye is more accurately modeled as a finite aperture camera, which is an approach taken in other 3D display view simulation work.
  • a pinhole camera is used for simplicity, as it can serve as an upper bound on quality over the finite aperture case in some sense.
  • the canonical image forms the basis of the observer image.
  • the canonical rays are considered. More specifically, it is assumed that the canonical image l c [D, 0] can be related to the canonical image through a type of upscaling operation.
  • the canonical image is a sampled version of a warped observer image. Applying the inverse warping function to the canonical image, a continuous version of it, would then give the observer image.
  • the warping function can also be described as a projection function.
  • a light field display can represent objects that are within a volumetric region defined by two separate viewing frusta, encompassing an area both behind and in front of the display surface. These two frusta are hereto referred to as the inner and outer frustum regions of a given display.
  • the canonical rays it should be noted, only form a subset of the rays that would contribute to the observer image. It is straightforward to observe that the set of canonical rays of an observer is independent of the observer’s direction Do and focal length fo. Thus, the set of all possible observers at a particular location share the same set of canonical rays relative to a display.
  • the canonical rays for a given display and observer may be seen to sample intensity values from the light field projected by the display and its light field projectors. These intensity values may be observed to form a M x by M y image, further refered hereto as the canonical image relative to display D and observer O. This image is denoted as I c [D, 0](x, y).
  • Each light field proj ector represents a segment of a continuous smooth light field. Given a display and an observer, each of the canonical rays samples the light field using the intensities within its corresponding light field projector array. That is given a canonical ray , an intensity value is reconstructed based on a resampling operation performed
  • the ray vector can be represented using spherical coordinates as .
  • indices ( u n , v n ) represent the light field projector pixel which has minimum angular distance from the sampling canonical ray.
  • Objects in a 2D display while lacking the additional perceptual cues of a 3D display, do not become unnaturally blurred at distance.
  • 3D objects that appear deep in a scene that are virtually distant from the display surface degrade in a natural way relative to the maximum resolution of the display. That is, as an object becomes more distant from the 2D display, its projected area on the 2D display is less, thus the number of pixels representing this projected area decrease with the size of the area. This corresponds to how more distant objects are projected to a smaller area on the retina (or an imaging plane of a camera) and thus less detail can be resolved.
  • objects at distance become blurry and are not represented at a resolution proportional to their projected area on the display surface.
  • a plane is constructed such that for an observer at ( x 0 , z 0 ), at whatever depth the plane is placed, its size is such that the plane projects onto each spatial pixel of the display. In other words, what is seen on the plane will take up the entire space of the display’s surface as shown in Figure 19.
  • the canonical image produced is A canonical ray has an
  • light field projector rays sample the plane and (2) incident canonical rays sample the light field from a subset of the light field projector rays.
  • incident canonical rays sample the light field from a subset of the light field projector rays.
  • the problem is simplified by assuming that a canonical ray sample is constructed using just one element of a light field projector through nearest neighbor interpolation.
  • the intensity associated with the ray then is l .
  • each ray is mapped to a corresponding ray in , indexed by as defined previously.
  • adjacent canonical rays are mapped to parallel light field projector rays.
  • two adjacent canonical rays are mapped to distinct rays in their corresponding light field projectors. That
  • the encoded layered scene decomposition representation of a light field produced from a sampling scheme applied to each layer is principally comprised of a plurality of pixels including RGB color and disparity.
  • selecting an appropriate bit width for the disparity (depth) field of the pixel is important, as the width of this field improves the accuracy of the operation during reconstruction.
  • the use of an increased number of bits contributes negatively to the compression rate achieved.
  • each layer of RGB color and disparity pixels specified by the given sampling scheme has a specific range of disparity corresponding to the individual pixels.
  • the present disclosure exploits this narrow range of disparity within each layered scene decomposition layer to increase the accuracy of the depth information.
  • the range of disparity for an entire scene is mapped to a fixed number of values. For example, in 10-bit disparity encoding, there can only be 1024 distinct depth values.
  • the same fixed number of values are applied to each layered scene decomposition layer, as each layer has known depth boundaries. This is advantageous as the transmission bandwidth can be reduced by decreasing the width of the depth channel, while maintaining pixel reconstruction accuracy.
  • the system implements a disparity width of 8-bits and the scene is decomposed into 8 layered scene decomposition layers, a total of 2048 distinct disparity values can be used, with each layer having 256 distinct possible values based on 8-bit representation. This is more efficient than mapping the entire range of possible disparity values within the inner or outer frustum to a given number of bits.
  • the present disclosure utilizes the same number of bits, but the bits are interpreted and distinctly represent disparity within each layered scene decomposition layer. Since each layered scene decomposition layer is independent from each other, depth (bit) encoding can differ for each layer and can be designed to provide a more accurate fixed- point representation.
  • a layered scene decomposition layer closer to the display surface has smaller depth values and can use a fixed point format with a small number of integer bits and a large number of fractional bits
  • layered scene decomposition layers further away from the display surface has larger depth values and can use a fixed point format with a large number of integer bits and a small number of fractional bits.
  • the fractional bits are configurable on a per layer basis:
  • MinFixedPoint 1 / (2 FractionalBits )
  • MaxFixedPoint 2 16-FractionalBits — MinFixedPoint
  • Disparity is calculated from the depth in the light field post-processing stage and encoded using the following formula:
  • ScaleF actor ( MaxFixedPoint— MinFixedPoint) /(Near ClipDisparity — Far Clip Disparity)
  • ScaleF actor ( MaxFixedPoint— MinFixedPoint) /(Near ClipDisparity — Far Clip Disparity)
  • Figure 18 illustrates a computer-implemented method comprising: receiving a first data set comprising a three-dimensional description of a scene
  • the present disclosure defines an encoder-decoder for various types of angular pixel parameterizations, such as, but not limited to, planar parameterizations, arbitrary display parameterizations, a combination of parameterizations, or any other configuration or parameterization type.
  • a generalized and illustrative embodiment of the present disclosure provides a method to generate a synthetic light field for multi-dimensional video streaming, multi-dimensional interactive gaming, or other light field display scenarios.
  • a rendering system and processes are provided that can drive a light field display with real time interactive content.
  • the light field display does not require long-term storage of light fields, however, the light fields must be rendered and transmitted at low latency to support an interactive user experience.
  • Figure 7 provides a CODEC system overview of the generalized, illustrative embodiment of the present invention.
  • a gaming engine or interactive graphics computer (70) transmits three-dimensional scene data to GPU (71).
  • the GPU encodes the data and sends it over the display port (72) to a decoding unit (73) containing a decoding processor such as an FPGA or ASIC.
  • the decoding unit (73) sends decoded data to a light field display (74).
  • Figure 1 illustrates another generalized, exemplary layered scene decomposition CODEC system, where light field data from a synthetic or video data source (50) is input to encoder (51).
  • a GPU (43) encodes the inner frustum volume data, dividing it into a plurality of layers
  • GPU (53) encodes the outer frustum volume data, dividing it into an additional plurality of layers. While Figure 1 illustrates separate GPUs (43, 53) dedicated for the inner and outer frustum volume layers, a single GPU can be utilized to process both the inner and outer frustum volume layers.
  • Each of the layered scene decomposition layers are transmitted to decoder (52), where the plurality of inner frustum volume layers (44(1) through 44(*)) and the plurality of outer frustum volume layers (54(1) through 54(*)) of a light field are decoded and merged into a single inner frustum volume layer (45) and a single outer frustum volume layer (55).
  • the inner and outer frustum volumes are then synthesized (merged) into a single, reconstructed set of light field data (56), otherwise referred to herein as a“final light field” or“display light field”.
  • Figures 10 to 13 illustrate exemplary CODEC process implementations according to the present disclosure.
  • Figure 10 illustrates an exemplary layered scene decomposition CODEC method, whereby 3D scene data in the format of image description or light field data is loaded to an encoder (400) for encoding, whereupon data (sub)sets as illustrated in the figure, or alternatively the entire data set representing the 3D scene is partitioned (403).
  • the identification process is a general process step reference which is intended to simply refer to the ability to partition the data set in one pass, or in groupings (e.g. to encode inner frustum and outer frustum data layers as illustrated in more detail in Figure 11), as may be desired according to the circumstances.
  • the identification of data subsets may imply pre-encoding processing steps or processing steps also forming part of the encoding sub-process stage (401).
  • Data subsets may be tagged, specified, confirmed, scanned and even compiled or grouped at the time of partitioning to produce a set of layers (decomposition of the 3D scene) (403).
  • each data layer is sampled and rendered according the present disclosure to produce compressed (image) data (404).
  • the compressed data is transmitted to a decoder (405) for the decoding sub-process (406) comprising decompression, decoding and re-composition steps to (re)construct a set of light fields (407), otherwise referred to herein as“layered light fields”, layered light field images and light field layers.
  • the constructed layered light fields are merged to produce the final light field (408) displaying the 3D scene (409).
  • FIG. 13 An exemplary, parallel CODEC process is illustrated in Figure 13 for optimizing the delivery of a light field representing a 3D scene in real-time (e.g. to minimize artifacts).
  • the process comprises the steps of loading 3D scene data to an encoder (700), encoding and compressing the residue encoded representation (701) of the final light field, transmitting the residue encoded representation (702) to a decoder, decoding the residue encoded representation and using the residue encoded representation with the core encoded representation to produce the final light field (703) and display the 3D scene at a display (704).
  • Figure 11 illustrates an embodiment related to the embodiment shown in Figure 10 in that two data (sub)sets; the inner frustum layer (502), and the outer frustum layer (503), that are derived based on the 3D scene data (500) are identified for partitioning (501) and the partitioning of each data set into layers of differential depths is implemented according to two different layering schemes for each data set (504, 505), i.e. equivalent to a plurality of data layers.
  • Each set (plurality) of data layers (506, 507) representing an inner frustum and outer frustum volume of a light field display respectively are subsequently sampled on a per layer basis according to sampling scheme (508, 509); and each sampled layer is rendered to compress the data and produce two sets of compressed (image) data (510, 511) in process steps (508, 509), respectively.
  • the sets of compressed data (510, 511) encoding the sets of light fields corresponding to the sets of data layers (506, 507), are then combined (512) to produce a layered, core encoded representation (513) (CER) of a final (display) light field.
  • Figure 12 illustrates an embodiment of a CODEC method or process to reconstruct a set of light fields and produce a final light field at a display.
  • the set of light fields (layered light fields) is (re)constructed from the core encoded representation (513) using multi-stage view synthesis protocols (600).
  • a protocol (designated as VS1-VS8) is applied (601-608) to each of the eight layers of the core encoded representation (513), which protocols may or may not be different depending on characteristics of each data layer light field to be decoded.
  • Each protocol may apply a form of non-linear interpolation termed herein as edge adaptive interpolation (609) to provide good image resolution and sharpness in the set(s) of layered light fields (610) reconstructed from the core encoded representation of said fields ensure image sharpness.
  • the layered light fields (610) are merged, in this case illustrating the merging of two sets of light fields (611, 612) corresponding to two data subsets to produce two sets of merged light fields (613, 614).
  • the merged sets of light fields (613, 614) may represent, for example, the inner frustrum and outer frustum volumes of a final light field and can be accordingly merged (615) to produce said final light field (616) at a display.
  • Encoding according to the present disclosure is designed to support the generation of real-time interactive content (for example, for gaming or simulation environments) as well as existing multi-dimensional datasets captured through light field generalized pinhole cameras or camera arrays.
  • the system encoder For a light field display D, a layered scene decomposition L, and a sampling scheme 5, the system encoder produces the elemental images associated with the light fields corresponding to each layered scene decomposition layer included in the sampling scheme. Each elemental image corresponds to a generalized pinhole camera. The elemental images are sampled at the resolution specified by the sampling scheme and each elemental image includes a depth map.
  • the set of generalized pinhole cameras specified by the encoding scheme for a given layered scene decomposition layer can be systematically rendered using standard graphics viewport rendering.
  • This rendering method results in a high number of draw calls, particularly for layered scene decomposition layers with sampling schemes including large numbers of the underlying elemental images. Therefore, in a system utilizing layered scene decomposition for realistic, autostereoscopic light field displays, this rendering method alone does not provide real-time performance.
  • a rendering technique utilizing standard graphics draw calls restricts the rendering of a generalized pinhole camera's planar parameterizations (identity function a) to perspective transformations.
  • Hardware-optimized rasterization functions provide the performance required for high-quality real-time rendering in traditional two-dimensional displays. These accelerated hardware functions are based on planar parameterizations.
  • parallel oblique projections can utilize standard rasterized graphics pipelines to render generalized pinhole camera planar parameterizations.
  • the present disclosure contemplates the application of rasterization to render the generalized pinhole camera views by converting sets of triangles into pixels on the display surface.
  • oblique rendering reduces the number of rendering passes required for each layered scene decomposition layer and can accommodate any arbitrary identity function a.
  • the system utilizes one parallel oblique projection per angle specified by the identity function a.
  • the system executes a "slice and dice" block transform (see United States Patent Nos. 6,549,308 and 7,436,537) to re-group the stored data from its by-angle grouping into an elemental image grouping.
  • the "slice and dice” method alone is inefficient for real-time interactive content requiring many separate oblique rendering draw calls when a large number of angles are to be rendered.
  • An arbitrary identity function a can also be accommodated by a ray-tracing rendering system.
  • ray tracing specifying arbitrary angles does not require higher performance than accepting planar parameterizations.
  • rasterization provides more reliable performance scalability than ray tracing rendering systems.
  • the present disclosure provides several hybrid rendering approaches to efficiently encode a light field.
  • encoding schemes render layered scene decomposition layers located closer to the display surface, with more images requiring less angular samples, and layers located further away from the display surface, with less images and more angular samples.
  • perspective rendering, oblique rendering, and ray tracing are combined to render layered scene decomposition layers; these rendering techniques can be implemented in a variety of interleaved rendering methods.
  • one or more light fields are encoded by a GPU rendering an array of two-dimensional pinhole cameras.
  • the rendered representation is created by computing the pixels from the sampling scheme applied to each of the layered scene decomposition layers.
  • a pixel shader performs the encoding algorithm.
  • Typical GPUs are optimized to produce a maximum of 2 to 4 pinhole camera views per scene in one transmission frame.
  • the present disclosure requires rendering hundreds or thousands of pinhole camera views simultaneously, thus multiple rendering techniques are employed to render data more efficiently.
  • the generalized pinhole cameras in the layered scene decomposition layers located further away from the display surface are rendered using standard graphics pipeline viewport operations, known as perspective rendering.
  • the generalized pinhole cameras in the layered scene decomposition layers located closer to the display surface are rendered using the "slice and dice" block transform. Combining these methods provides high efficiency rendering for layered plenoptic sampling theory sampling schemes.
  • the present disclosure provides layered scene decomposition layers wherein layers located further away from the display surface contain a smaller number of elemental images with a higher resolution and layers located closer to the display surface contain a greater number of elemental images with a lower resolution.
  • Rendering the smaller number of elemental images in the layers further away from the display surface with perspective rendering is efficient, as the method requires only a single draw call for each elemental image.
  • perspective rendering becomes or is inefficient for layers located closer to the display surface, as these layers contain a greater number of elemental images, requiring an increased number of draw calls. Since elemental images located in layers located closer to the display surface correspond to a relatively small number of angles, oblique rendering can efficiently render these elemental images with a reduced number of draw calls.
  • a process to determine where the system should utilize perspective rendering, oblique rendering, or ray tracing to render the layered scene decomposition layers is provided.
  • each layered scene decomposition layer is evaluated to compare the number of elemental images to be rendered (i.e., the number of perspective rendering draw calls) to the size of the elemental images required at the particular layer depth (i.e., the number of oblique rendering draw calls), and the system implements the rendering method (technique) requiring the least number of rendering draw calls.
  • an alternative rendering method renders layers located closer to the display surface, or a portion of the layers located closer to the display surface, using ray tracing.
  • each pixel in a layered scene decomposition layer is associated with a light ray defined by the light field.
  • Each ray is cast and the intersection with the layered scene decomposition is computed as per standard ray tracing methodologies.
  • Ray tracing is advantageous when rendering an identity function a which does not adhere to the standard planar parameterizations expected by the standard GPU rendering pipeline, as ray tracing can accommodate the arbitrary ray angles that are challenging for traditional GPU rendering.
  • the valid viewing zone is a set of all locations in space where an observer can view every hogel on the display at an angle within the field of view and as a result receives a pixel from each hogel. This zone will be where the projection frustum of every hogel intersects.
  • the definition of the valid viewing zone can be effectively slimmed down to where the projection frustum of the four comer hogels intersect.
  • the comers are the most extreme cases so if a location is within the projection frustum of the four comers the location is also within the valid viewing zone.
  • This approach also introduces the concept of a maximum viewing distance which is the constraint introduced in order to realize these savings and efficiencies.
  • the viewing frustum is a rectangular pyramid whose tip is oriented along the negative display normal and whose base is at an infinite depth from the display (i.e. a standard frustum).
  • the base of the rectangular pyramid now has a base whose distance is the same as the maximum viewing distance.
  • the approach taken to realize savings is not rendering or sending pixels that will not be projected into the valid viewing zone and are therefore wasteful.
  • the number of pixels that are needed for a specified maximum viewing distance is the hogel fill factor.
  • the hogel fill factor is the ratio between the viewing zone size and the hogel projection size at a given depth (i.e. in 2D, if the hogel projection has a width of 1 m and the viewing zone has a width of 0.5 m than only half the projected pixels were needed).
  • DW represents the display width in meters
  • MVD is the minimum view distance (in meters)
  • FOV is the field of view (in degrees).
  • the maximum viewing distance is defined as MVD + y, where y represents the size of usable range in meters.
  • angle b is equal to angle a, where angle b is equal to the field of view (in degrees).
  • the width of the viewing zone, labeled c is defined by the equation:
  • the hogel fill factor in 2D is the ratio between c and e, therefore:
  • the result of increasing or decreasing the hogel fill factor is an increase or decrease in maximum viewing depth respectively.
  • a strategy to produce a corrected light field is to rasterize a light field and then apply a per pixel warp operation. Where the pixel is supposed to go is determined by a characterization routine that involves imaging the display. How a light field is warped depends on an equation whose form does not change but has coefficients that do change. These coefficients are unique to each display. The idea behind the correction (but not literally how it works) is that if pixel was supposed to be at X but instead was measured at X+0.1 the pixel would be warped to location X-0.1 in anticipation that it will be measured at X. The goal is to have the measured location match the intended location.
  • Screen Space Ray Tracing An alternative to the warping approach to view synthesis is screen space ray tracing.
  • McGuire et al. propose application of screen space ray tracing to multiple depth layers (for robustness). These depth layers are those that are produced by depth peeling.
  • the depth peeling algorithm is slow, therefore when using modem GPUs, single- pass methods are preferred, e.g., Mara et al, based around reverse reprojection, multiple viewport, and multiple rasterization.
  • Decoding according to the present disclosure is designed to exploit the encoding strategy (sampling and rendering).
  • the core representation as a set of layered light fields from a downsampled layered scene decomposition is decoded to reconstruct the light fields LF O and LF P .
  • elemental images are decoded by reconstructing the light fields LF O and LF P from deconstructed LF° and LF P light fields downsampled as specified by sampling scheme S.
  • the pixels align such that the inner and outer frustum volume layers located closer to the display surface are reviewed first, moving to inner and outer frustum volume layers located further away from the display surface until a non-empty pixel is located, and the data from the non-empty pixel is transmitted to the empty pixel closer to the display surface.
  • particular implementations may restrict viewing to the inner frustum volume or the outer frustum volume of the light field display, thereby requiring the decoding of one of LF O or LF P .
  • a decoding process is represented by the following pseudocode:
  • a similar procedure reconstructs LF P .
  • Each layered scene decomposition layer is reconstructed from the limited samples defined by the given sampling scheme S.
  • Each of the inner frustum volume layers or the outer frustum volume layers are merged to reproduce LF O or LF P .
  • ReconLF can be executed in various forms with varying computational and post- CODEC image quality properties.
  • ReconLF may be defined as a function, such that, given a light field associated with a layer that has been sampling according to given sampling scheme S, and the corresponding depth map for the light field, it reconstructs the full light field that has been sampled.
  • the ReconLF input is the subset of data defined by the
  • DIBR Depth-Image Based Rendering
  • Graziosi et al can reconstruct the input light field.
  • DIBR can be classified as a projection rendering method.
  • ray-casting methods such as the screen space ray casting taught by Widmer et al, can reconstruct the light fields. Ray casting enables greater flexibility than re-projection but increases computational resource requirements.
  • the interpolated value can exhibit significant artifacts, as information from across the edge boundary is included in the interpolation operation.
  • the synthesized image is generated with a“smeared”’ or blurred edge.
  • the present disclosure provides a back-projection technique for the interpolation substep, producing a high-quality synthesized image without smeared or blurred edges.
  • the present disclosure introduces edge-adaptive interpolation (EAI), where the system incorporates depth map information to identify the pixels required by the interpolation operation to calculate the colour of the warped pixels in a reference image.
  • EAI is a nonlinear interpolation procedure that adapts and preserves edges during low-pass filtering operations.
  • the present disclosure utilizes the depth map D m (I t ) pinhole camera parameters if, a, etc.) and the relative position of the display’s array of planar-parameterized pinhole projectors to warp each l t pixel integer (x, y, ) to a real -number position (x w , y w ) in l r .
  • a value must be reconstructed based on I r integer samples.
  • Linear interpolation methods known in the art reconstruct / r (x w , y w ) from the four nearest integer coordinates located in a 2 x 2 pixel neighborhood. Alternate reconstruction methods use larger neighborhoods (such as 3 x 3 pixel neighborhoods), generating similar results with varying reconstruction quality (see Marschner et al,“An evaluation of reconstruction filters for volume rendering”). These linear interpolation methods have no knowledge of the underlying geometry of the signal. The smeared or blurred edge images occur when the reconstruction utilizes pixel neighbors belonging to different objects, separated by an edge in the images. The erroneous inclusion of colour from other objects creates ghosting artifacts. The present disclosure remedies this reconstruction issue by providing a method to weigh or omit pixel neighbors by using the depth map D m (/ r ) to predict the existence of edges created when a plurality of objects overlap.
  • FIG. 3A illustrates textures (80,83), where a sampling location, illustrated as a black dot (86), is back-projected into another image being reconstructed.
  • the sampling location (86) is located near the boundary of a dark object (87) with a white background (88).
  • a first reconstruction matrix (81) the full 2x2 pixel neighborhood, each single white pixel represented by a square (89), reconstructs the sampling location (86) value using a known technique such as linear interpolation. This results in a non-white pixel (82), as the dark object (87) is included in the reconstruction.
  • the second reconstruction matrix (84) uses the EAI technique of the present disclosure, reconstructing the sampling location (86) from the three neighboring single white pixels (90). EAI detects the object edge and omits the dark pixel (87), resulting in the correct white pixel reconstruction (85).
  • the threshold is a feature size parameter.
  • the weight function determines how to reconstruct
  • the Recon function can be a simple modified linear interpolation, where the weights are incorporated with standard weighting procedures and re-normalized to maintain a total weight of 1.
  • the present disclosure also provides a performance-optimized decoding method for reconstructing the layered scene decomposition.
  • LF O can be reconstructed by decoding the elemental images specified by sampling scheme S.
  • the ReconLF method for particular layers does not include inherent constraints regarding the order that the missing pixels of the missing elemental images are to be reconstructed. It is an object of the present disclosure to reconstruct missing pixels using a method that maximizes throughput; a light field large enough for an effective light field display requires an exceptional amount of data throughput to provide content at an interactive frame rate, therefore improved reconstruction data transmission is required.
  • Figure 3B illustrates a general process flow for reconstructing a pixel array.
  • the pixels are reconstructed in two basic passes. Each pass operates in separate dimensions of the array of elemental images; the system executes the first pass as a column decoding, and the second pass as a row decoding, to reconstruct each of the pixels. While the present disclosure describes a system employing column decoding followed by row decoding, this is not meant to limit the scope and spirit of the invention, as a system employing row decoding followed by column decoding can also be utilized.
  • the elemental images specified by sampling scheme S are used as reference pixels to fill in missing pixels.
  • Figure 4 illustrates the elemental images in the matrix as B, or blue pixels (60).
  • the missing pixels (61) are synthesized strictly from reference pixels in the same column.
  • Figure 5 illustrates schematically a column-wise reconstruction of a pixel matrix, as part of the image (pixel) reconstruction process showing column-wise reconstruction (63) of red pixels (62) and blue pixels (60). These newly synthesized column-wise pixels are shown as R, or red pixels (62) in Figure 5 next to blue pixels (60) and missing pixels (61).
  • Newly reconstructed pixels written to a buffer and act as further pixel references for the second pass, which reconstructs pixels reference pixels located in the same row as other elemental images.
  • Figure 6 illustrates a subsequent row- wise reconstruction (64) of the pixel matrix, as part of the image (pixel) reconstruction process alongside the column-wise reconstruction (63). These newly synthesized row-wise pixels are shown as G, or green pixels (65) next to blue pixels (60) and red pixels (62).
  • a process for reconstructing a pixel array is represented by the following pseudocode algorithm:
  • This performance-optimized decoding method allows the row-decoding and column-decoding constraints to limit the effective working data set required for reconstruction operations.
  • the reduced dataset can be stored in a buffer while rows and columns of missing elemental images are being reconstructed, thereby providing improved data transmission.
  • the layers are merged into a single inner display volume layer and a single outer display volume layer.
  • the layered scene decomposition layers can be partially decompressed in a staged decompression or can be fully decompressed simultaneously. Algorithmically, the layered scene decomposition layers can be decompressed through a front-to-back or back-to-front process. The final double frustum merging process combines the inner and outer display volume layers to create the final light field for the light field display.
  • Layered scene decomposition achieves multi- dimensional scene decomposition into layers, or subsets and elemental images, or subsections.
  • Machine learning is emerging as a learning-based method of view synthesis.
  • a layered scene decomposition provides a method of downsampling of a light field following its decomposition into layers. Previously, this was considered in the context of opaque surface rendering with Lambertian shaded surfaces. What is desired is a method of downsampling light fields, as previous, but can be applied with higher-order lighting models, including semi semi- transparent surfaces, for example, direct volume rendering based lighting models.
  • Volume rendering techniques include but are not limited to direct volume rendering (DVR), texture-based volume rendering, volumetric lighting, two-pass volume rendering with shadows, or procedural rendering.
  • Direct volume rendering is rendering process which: maps from a volume (e.g. voxel-bases sampling of a scalar field) data-set to a rendered image without intermediary geometry (no isosurface).
  • a volume e.g. voxel-bases sampling of a scalar field
  • the scalar field defined by the data is considered as a semi-transparent, light emitting medium.
  • a transfer function specifies how the field is mapped to opacity and color and a ray-casting procedure then accumulates the local color, opacity along paths from a camera and through the volume.
  • Levoy (1988) first presented that direct volume rendering methods generate images of a 3D volumetric data set without explicitly extracting geometric surfaces from the data. Kniss et al.
  • volume data is stored as a stack of 2D texture slices or as a single 3D texture object.
  • voxel denotes an individual "volume element,” similar to the terms pixel for "picture element” and texel for "texture element.”
  • Each voxel corresponds to a location in data space and has one or more data values associated with it. Values at intermediate locations are obtained by interpolating data at neighboring volume elements. This process is known as reconstruction and plays an important role in volume rendering and processing applications.
  • optical model The role of an optical model is to describe how light interacts with particles within the volume. More complex models account for light scattering effects by considering illumination (local) and volumetric shadows. Optical parameters are specified by the data values directly, or they are computed from applying one or more transfer functions to the data to classify particular features in the data.
  • Martin implements volume rendering using volume data sets and provides a depth buffer to assign a depth value for each individual pixel location.
  • the depth buffer, or z- buffer is converted to a pixel disparity and the depth buffer value, , is converted into normalized coordinates in the range [-1, 1], as Then the perspective
  • Wp is the image width in pixels
  • Wr is the image sensor width in real units. If the image sensor width in real units is unknown, Wr can be computed from the camera field of view Q and focal length f as:
  • View synthesis may also be formulated through warping. While warping is a simple method to synthesize new views, it can produce visual artifacts that can degrade the visual quality of the warped image. The most common of these artifacts are disocclusions, cracks and ghosting.
  • Occlusion holes occur when a foreground object is warped and the reference views do not contain the data for the background pixels that are now in view. Occlusion holes can be fixed by inpainting the hole with available background information or filling the hole with actual data captured by extra references or residue information.
  • Warp cracks occur when warping a surface and two pixels that are adjacent in the reference view are warped to the new view and are now longer adjacent but are separated by a small number of pixels. Rounding errors can cause warp cracks because the newly calculated pixel coordinates have to be truncated to integer image coordinates, which can cause an adjacent pixel to round differently. Sampling frequency can cause warp cracks by trying to warp a surface into an orientation that increases its pixel count, i.e. plane slanted to a camera and then viewed perpendicular. The new view will want to display pixels that were beyond the sampling frequency of the reference camera, leading to cracks in the new image.
  • Ghosting can occur during the backward warping interpolation phase.
  • the back projected pixel neighborhood contains pixels from both a background and foreground object. Pixels from the foreground can bleed color information into the background which can cause a "halo" or ghosting effect. These usually occur around an occlusion holes and bleed foreground color into the background.
  • One of the main problems with forward warping is that the warped image can contain warp cracks which degrade the visual quality.
  • Depth maps generated with forward warping can easily be fixed by merging multiple views warped from different references or by applying a crack filter. Filters, like a median filter, can effectively remove cracks because a depth map is a very low frequency image, containing mostly subtle gradients or edges around objects. These simple filters will not work on color images due to the complexity of the object textures.
  • a method to remove warp cracks in the color image is to use backward warping. In backward warping, you first forward warp the depth image to get a depth map of the new view. After filtering the cracked depth map, you use the filtered depth to warp back to a reference image.
  • Interactive direct volume rendering is required for interactive viewing of time- varying 4D volume data, as progressive rendering may not work well for that particular use case as presented by Martin.
  • Example use cases for interactive direct volume rendering include but are not limited to the rendering of static voxel-based data without artifacts during rotation, rendering of time-varying voxel-based (e.g. 4D MRI or Ultrasound, CFD, wave, meteorological, visual effects (OpenVDB) and other physical simulations, etc.) data.
  • a proposed solution includes the use of machine learning to leam how to "warp" volumetric scene views, potentially constrained to a particular transfer function, which communicates how to map density of the different materials to color and then its level of transparency.
  • a Computational Neural Network could be trained very well with a modest sized data set to allow a used to define a decoder that only works for volume data and only works for a particular transfer function.
  • the potential results is a hardware system, or a decoding system, that when the desired transfer function is selected, it would change the decoder slightly from a different training data set in order to be able to decode the data has been given.
  • a proposed method disclosed herein is suited for the rendering of 4D volume data. Using current hardware and hardware techniques, it is very difficult to brute force render light fields of volume data. What is proposed is generating a layered scene decomposition of volume data which is to be rendered and decoded using a decoder. Decoding the data is effectively filling in missing pixels or elemental images.
  • a convolutional neural network may be trained to solve smaller versions of the problem, using the a system employing column decoding followed by row decoding, in addition, a system employing row decoding followed by column decoding may also be utilized. Martin teaches that in order to perform fast but accurate image warping using a disparity map, a form of backward warping with bilinear interpolation is to be implemented.
  • the estimated disparity map for the central view is used an an estimate for all views. Pixels in the novel view that should read data from a location that falls outside the border of the reference view are set to read the closest border pixel in the reference view instead. Essentially, this stretches the border of the reference view in the novel view, rather than producing holes. Since warped pixels rarely fall at integer positions, bilinear interpolation is applied to accumulate information from the four nearest pixels in the reference view. This results in fast warping with no holes, and good accuracy. Martin further discloses a method of training a neural network to apply this correction function. An improvement on this is to teach a neural network compatible with layered scene decomposition. This could be further expanded to apply a convolutional neural network for each layer within a layered scene decomposition that was specifically trained for that layer, each depth. A neural network would then be set to perform row reconstruction while another neural network would be set to perform column reconstruction.
  • a light field display simulator could be used to train the neural network.
  • a light field display simulator provides a high-performance method of exploring the parameterization of simulated virtual three-dimensional light field displays. This method uses a canonical image generation method as part of its computational process to simulate a virtual observer's view(s) of a simulated light field display. The canonical image method provides for a robust, fast and versatile method for generating a simulated light field display and the light field content displayed thereon.
  • the color of an opaque dielectric can be modeled with Lambertian reflectance; the color is considered constant with respect to the viewing angle. This correlates with the standard methods of color measurement used in industry which are based on the same Lambertian (or near Lambertian) reflectance.
  • the fundamental concept of layered scene decomposition is the ability to partition a scene into sets and subsets with the ability to then reconstruct these partitions for the formation of a light field. This concept is based upon the capability to warp pixels to reconstruct missing elemental images in a layer by way of warping.
  • the light intensity of a particular point in one image is mapped into another image in a slightly different pixel position, based on the geometric shift that results when an image in camera is shifted left to right.
  • the assumption that the warping method as described herein can be used to accurately reconstruct the missing pixels within layers is that the pixels mapping from one image to the next have the same color, as they would with a Lambertian lighting model.
  • Light fields, especially when restricted under the assumption of a Lambertian lighting model, have significant redundancy. It may be observed that each elemental image differs very slightly from neighboring images. This redundancy is described in the literature under plenoptic sampling theory. The Lambertian lighting model is sufficient for useful graphics, but not overly realistic.
  • gloss, haze, and goniochromatic color of an object are investigated, including, but not limited to, the specular exponent of the Phong model, the surface roughness of the Ward model, and the surface roughness of the Cook-Torrance model.
  • Gloss may be defined as a measure of the magnitude of the specular reflection
  • haze may be defined as the parameter which captures the width of the specular lobe.
  • shading can be applied as a post-process to reconstructed pixels.
  • This post-process occurs after warping process (or other view synthesis reconstruction) has occurred.
  • the surface normal information relative to a light position, or point may be known and included with encoded light field data. This encoded list of light positions, to allow the decoder to use that normal data when its decoding a particular pixel in a layer to compute a specular component.
  • Other parameters could be included with the light field data, such as, the property of whether a surface has a specular component or not, or to what degree it is could be quantified in a number.
  • Material properties could also be potentially included along with intensity values. This additional data could be sent in combination with the typical RGB and depth data that is sent with the encoded form within each elemental image, or each layer. Material properties may include, but are not limited to atomic, chemical, mechanical, thermal and optical properties. [319] The concept of storing surface normals in combination with RGB and depth information is known as a G-Buffer in computer graphics.
  • each overlapping surface element piece has both (1) a color and (2) a transparency measure, or alpha, a, value.
  • This a value may then be used during the decoding process in order to produce a light field image of a scene containing transparent surfaces a values associated with a sampled pixel produced, or selected, during the encoding, may be re-projected along with depth values during the reconstruction process where the light field associated with each layer (or scene subset) is reconstructed.
  • This re-projection can be a warping process as described related to depth image-based rendering (DIBR).
  • DIBR depth image-based rendering
  • Figure 15 illustrates an exemplary layered scene decomposition CODEC method, whereby 3D scene data in the format of image description or light field data is loaded to an encoder (400) for encoding, whereupon data (sub)sets as illustrated in the figure, or alternatively the entire data set representing the 3D scene is partitioned (403).
  • the identification process is a general process step reference which is intended to simply refer to the ability to partition the data set in one pass, or in groupings (e.g. to encode inner frustum and outer frustum data layers as illustrated in more detail in Figure 11), as may be desired according to the circumstances.
  • the identification of data subsets may imply pre-encoding processing steps or processing steps also forming part of the encoding sub-process stage (401).
  • Data subsets may be tagged, specified, confirmed, scanned and even compiled or grouped at the time of partitioning to produce a set of layers (decomposition of the 3D scene) (403).
  • each data layer is sampled and rendered according the present disclosure to produce compressed (image) data (404).
  • the compressed data is transmitted to a decoder (405) for the decoding sub-process comprising decompression, decoding and re-composition steps to (re)construct a set of light fields (407), otherwise referred to herein as“layered light fields”, layered light field images and light field layers. Specular lighting is calculated (411) and the constructed layered light fields are merged to produce the final light field (408) displaying the 3D scene (409).
  • Figure 16 illustrates a computer-implemented method comprising:
  • the first data set comprises information on directions of normals on surfaces included in the scene (421); the directions of the normals are represented with respect to a reference direction (422); and
  • the reflection properties of at least some of the surfaces are non- Lambertian
  • the method further comprises:
  • Figure 21 illustrates a computer-implemented method comprising:
  • the first data set comprises information on transparency of surfaces included in the scene (429);
  • encoding multiple layers to generate a second data set, wherein a size of the second data set is smaller than a size of the first data set (424).
  • the method further comprises:
  • EXAMPLE 1 Exemplary Encoder and Encoding Method for a Light Field Display
  • a conventional display as previously known in the art consists of spatial pixels substantially evenly-spaced and organized in a two-dimensional row, allowing for an idealized uniform sampling.
  • a three-dimensional (3D) display requires both spatial and angular samples. While the spatial sampling of a typical three-dimensional display remains uniform, the angular samples cannot necessarily be considered uniform in terms of the display’s footprint in angular space.
  • a plurality of light field planar-parameterized pinhole projectors provide angular samples, also known as directional components of the light held.
  • the light field display is designed for a 640 x 480 spatial resolution and a 512 x 512 angular resolution.
  • the plurality of planar-parameterized pinhole projectors are idealized with identity function a.
  • the pitch between each of the plurality of planar- parameterized pinhole projectors is 1 mm, thereby defining a 640 mm x 400 mm display surface.
  • Current displays known in the art are driven by DisplayPort technology providing maximum bandwidths of 32.4 Gb/s. therefore such displays would require over 1024 DisplayPort cables to provide the tremendous bandwidth required by interactive light field displays, resulting in cost and form-factor design constraints.
  • the illustrative embodiment delivers data to a light field display from a computer equipped with an accelerated GPU with dual DisplayPort 1.3 cables output. We consider a conservative maximum throughput of 40 Gb/s. The encoded frames must be small enough for transmission over the DisplayPort connection to a decoding unit physically located closer to the light field display.
  • the layered scene decomposition of the illustrative embodiment is designed to allow the required data throughput.
  • the layered scene decomposition places a plurality of layered scene decomposition layers within the depth of field region of the light field display, ensuring that the distance of the layered scene decomposition layers from the display surface is less than Z DOF .
  • This illustrative example describes a light field display with objects located only within the inner frustum volumes of the display.
  • This illustrative example is not intended to limit the scope of the invention, as the invention can successfully implement a plurality of system parameters, such as a light field display with objects located only within the outer frustum volume of the display, or a light field display with objects located within both the inner and outer frustum volumes of the display; embodiments limited to one frustum volume require a smaller number of layered scene decomposition layers, thereby marginally decreasing the size of the encoded light field to be produced.
  • system parameters such as a light field display with objects located only within the outer frustum volume of the display, or a light field display with objects located within both the inner and outer frustum volumes of the display; embodiments limited to one frustum volume require a smaller number of layered scene decomposition layers, thereby marginally decreasing the size of the encoded light field to be produced.
  • the illustrative embodiment defines ten layered scene decomposition layers. When necessary, additional layered scene decomposition layers can be added to capture data that could be lost to occlusions, or to increase the overall compression rate. However, additional layered scene decomposition layers require additional computation from the decoder, thus the number of layered scene decomposition layers is carefully chosen.
  • the illustrative embodiment specifies the ten layered scene decomposition layers from their front and back boundaries and assumes that the dividing boundaries of the layer are parallel to the display surface.
  • Each layered scene decomposition layer is located at a defined distance from the display surface, where the distances are specified in terms of multiples of focal length /, up to the maximum depth of field of 512/.
  • Layered scene decomposition layers with a narrower width are concentrated closer to the display surface, and the layer width (i.e., the depth difference between the front and back layer boundaries) increases exponentially by powers of 2 as the distance from the display surface increases.
  • This embodiment of the invention is not intended to limit the scope of the invention, as other layer configurations can be implemented successfully.
  • Table 1 describes the layered scene decomposition layer configurations of the illustrative embodiment, and provides a sampling scheme based on plenoptic sampling theory to create sub-sampled layered scene decomposition layers:
  • layer 0 captures images that are to be displayed at the display surface , as in a conventional two-dimensional display known in the art.
  • Layer 0 contains 640 x 480 pixels at a fixed depth, so it does not require any depth information.
  • the total data size is calculated for each pixel with an RGB value and a depth value for 8 bits each (alternate embodiments may require larger bit values, such as 16 bits).
  • the elemental image resolution and sampling gap are calculated from the formulas described above, and the sampling scheme chosen reflects the elemental image resolution and sampling gap restrictions.
  • the layered scene decomposition layers are configured by an encoder, efficiently implementing an oblique rendering technique to produce the layers located closer to the display surface (layers 0 to 5) and a perspective rendering technique to produce the layers located further away from the display surface (layers 6 to 9).
  • Each elemental image corresponds to a single rendering view.
  • Figure 8 illustrates the illustrative embodiment, with ten layered scene decomposition layers (100-109) in the inner frustum volume (110).
  • the inner frustum volume layers extend from the display surface (300).
  • the layers are defined as described in the table above, for example, the front boundaries of the inner frustum volume layer 0 (100) is If, inner frustum volume layer 1 (101) is If, inner frustum volume layer 2 (102) is 2f, inner frustum volume layer 3 (103) is 4f, and so on.
  • Inner frustum volume layers(100- 105) 0 to 5, or layers closest to the display surface(300), are rendered with the oblique rendering technique, and inner frustum volume layers(106-109), 6 to 9 furthest from the display surface are rendered with the perspective rendering technique.
  • Figure 9 illustrates an alternate embodiment, with ten layered scene decomposition layers (100-109) in the inner frustum volume (110) and ten layered scene decomposition layers (200-209) in the outer frustum volume (210).
  • the inner and outer frustum volume layers extend from the display surface (300). While the inner and outer frustum volume layers are illustrated as mirror images from each other, the inner and outer frustum volume may have differing numbers of layers, layers of different sizes, or layers of different depths.
  • Inner frustum volume layers 0 to 5 (100-105) and outer frustum volume layers 0 to 5 (200-205) are rendered with the oblique rendering technique, and inner frustum volume layers 6 to 9 (106-109) and outer frustum volume layers 6 to 9 (206-209) farther from the display surface (300) are rendered with the perspective rendering technique.
  • An alternate embodiment can implement the system with a ray -tracing encoding based approach.
  • Rendering a complete layered scene decomposition layer representation can require increased GPU performance, even with the optimizations described herein, as GPUs are optimized for interactive graphics on conventional two-dimensional displays where accelerated rendering of single views is desirable.
  • the computational cost of the ray-tracing approach is a direct function of the number of pixels the system is to render. While the layered scene decomposition layer system contains a comparable number of pixels to some two-dimensional single view systems, the form and arrangement of said pixels differs greatly due to layer decomposition and corresponding sampling schemes. Therefore, there may be implementations where tracing some or all of the rays is a more efficient implementation.
  • EXAMPLE 2 CODEC Decoder and Decoding Method for a Light Field Display
  • the decoder receives the 12.01 GB/s of encoded core representation data, plus any residue representation data, from the GPU over dual DisplayPort 1.3 cables.
  • the compressed core representation data is decoded using a customized FPGA, ASIC, or other integrated circuit to implement efficient decoding (residue representation data is decoded separately, as illustrated in Figure 13).
  • the 12.01 GB/s core representation is decompressed to 58 Tb/s for the final light field display. Note that this core representation does not include the residue representations necessary to render occlusions.
  • The provides a compression ratio
  • the reconstructed light field data may still exhibit occlusion-based artifacts unless residue representation data is included in the reconstruction.
  • data is decoded by reconstructing individual layered scene decomposition layers and merging the reconstructed layers into an inner frustum volume layer.
  • data is decoded by reconstructing individual layered scene decomposition layers and merging the reconstructed layers into an inner frustum volume layer and an outer frustum volume layer.
  • a single layered scene decomposition layer can be reconstructed from given sampling scheme sampling of data using view synthesis techniques from the field of
  • Image-Based Rendering which are known in the art.
  • Graziosi et al. specify using reference elemental images to reconstruct the light field in a single pass.
  • This method uses reference elemental images offset from the reconstructed image in multiple dimensions.
  • the elemental image data represents three dimensional scene points (including RGB color and disparity)
  • pixels are decoded as a nonlinear function (although fixed on the directional vector between the reference and target elemental images), therefore requiring a storage buffer of equal size to the decoding reference elemental images.
  • this can create memory storage or bandwidth constraints, depending on the decoding hardware.
  • Current high-performance FPGA devices provide internal block memory (BRAM) organized as 18/20-bit wide memory and 1024 memory locations which can be used as a 36/40-bit wide memory with 512 memory locations.
  • BRAM block memory
  • a buffer capable of reading and writing an image in the same clock cycle is large enough to hold two reference elemental images, as the nonlinear decoding process causes the write port to use a non-deterministic access pattern. Implementing this buffer in an FPGA device for a 512 x 512 pixel image requires 1024 BRAM blocks.
  • each buffer may be required in each decoder pipeline.
  • the system may require more than one hundred parallel pipelines, which is magnitudes more pipelines than current FPGA devices. Because each buffer requires an independent read/write port, it may not be possible to implement such a system on current ASIC devices.
  • the present disclosure circumvents buffer and memory limitations by dividing the pixel reconstruction process into multiple, single-dimension stages.
  • the present disclosure implements one dimensional reconstruction to fix the directional vector between the reference elemental images and the target to a rectified path. While reconstruction remains nonlinear, the reference pixel to be translated to the target location is locked to the same row or column location of the target pixel. Therefore, decoder buffers only need to capture one row or one column at a time.
  • the decoder buffer is organized as a 24-bit wide, 1024 deep memory requiring two 36/40 x 512 BRAM. Therefore, the present disclosure has reduced the memory footprint by a factor of 512, or multiple magnitudes. This allows a display pixel fill rate requiring over a hundred decoding pipelines to be supported by current FPGA devices.
  • Multi-stage decoding architectures require two stages to reconstruct the two dimensional pixel array in a light field display.
  • the two stages are orthogonal to one another and reconstruct rows or columns of elemental images.
  • the first decoding stage may require a pixel scheduler to ensure that output pixels ordered to be compatible with the next stage input pixels. Due to the extremely high bandwidth required by each decoding stage, some output pixels from a previous stage may need to be reused to reduce local storage requirements. In this case, an external buffer can be used to capture all of the output pixels from a first stage so the subsequent decoding stage can efficiently access pixel data, reducing logic resources and memory bandwidth.
  • the present disclosure's multi-stage decoding with an external memory buffer allows the decoding process to transfer the required memory bandwidth from expensive on-die memory to lower cost memory devices such as double data rate (DDR) memory devices.
  • DDR double data rate
  • a high-performance decoding pixel scheduler ensures maximum reference pixel reuse from this external memory buffer, allowing the system to use narrower or slower memory interfaces.
  • DODGSON, N.A Analysis of the viewing zone of multiview autostereoscopic displays. Electronic Imaging 2002. International Society for Optics and Photonics, pages pp 254-265, 2002.
  • GRAZIOSI D.B., APLASLAN, Z.Y., EL-GHOROURY, H.S., Compression for Full-Parallax Light Field Displays. Proc. SPIE 9011, Stereoscopic Displays and Applications XXV, (MARCH), 90111 A. 2014.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)
PCT/CA2020/050228 2019-02-22 2020-02-22 Layered scene decomposition codec system and methods Ceased WO2020181360A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN202080016205.2A CN113748682B (zh) 2019-02-22 2020-02-22 分层场景分解编解码系统及方法
KR1020217028928A KR102602719B1 (ko) 2019-02-22 2020-02-22 레어어화된 장면 분해 코덱 시스템 및 방법들
CN202511380095.9A CN121509648A (zh) 2019-02-22 2020-02-22 分层场景分解编解码系统及方法
CA3127545A CA3127545C (en) 2019-02-22 2020-02-22 Layered scene decomposition codec system and methods
JP2021544756A JP7387193B2 (ja) 2019-02-22 2020-02-22 層状シーン分解コーデックシステムおよび方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962809390P 2019-02-22 2019-02-22
US62/809,390 2019-02-22

Publications (1)

Publication Number Publication Date
WO2020181360A1 true WO2020181360A1 (en) 2020-09-17

Family

ID=72140181

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2020/050228 Ceased WO2020181360A1 (en) 2019-02-22 2020-02-22 Layered scene decomposition codec system and methods

Country Status (6)

Country Link
US (13) US11363249B2 (https=)
JP (1) JP7387193B2 (https=)
KR (1) KR102602719B1 (https=)
CN (2) CN113748682B (https=)
CA (1) CA3127545C (https=)
WO (1) WO2020181360A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220189137A1 (en) * 2020-12-10 2022-06-16 Canon Kabushiki Kaisha Apparatus, method, and storage medium

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8866920B2 (en) 2008-05-20 2014-10-21 Pelican Imaging Corporation Capturing and processing of images using monolithic camera array with heterogeneous imagers
US8878950B2 (en) 2010-12-14 2014-11-04 Pelican Imaging Corporation Systems and methods for synthesizing high resolution images using super-resolution processes
CN104081414B (zh) * 2011-09-28 2017-08-01 Fotonation开曼有限公司 用于编码和解码光场图像文件的系统及方法
CN107346061B (zh) 2012-08-21 2020-04-24 快图有限公司 用于使用阵列照相机捕捉的图像中的视差检测和校正的系统和方法
US8866912B2 (en) 2013-03-10 2014-10-21 Pelican Imaging Corporation System and methods for calibration of an array camera using a single captured image
EP3565259A1 (en) * 2016-12-28 2019-11-06 Panasonic Intellectual Property Corporation of America Three-dimensional model distribution method, three-dimensional model receiving method, three-dimensional model distribution device, and three-dimensional model receiving device
EP3598390A1 (en) * 2018-07-19 2020-01-22 Thomson Licensing Method for estimating a depth for pixels, corresponding device and computer program product
CN111104830A (zh) * 2018-10-29 2020-05-05 富士通株式会社 用于图像识别的深度学习模型、该模型的训练装置及方法
US11363249B2 (en) * 2019-02-22 2022-06-14 Avalon Holographics Inc. Layered scene decomposition CODEC with transparency
US11172148B2 (en) 2020-01-07 2021-11-09 Google Llc Methods, systems, and media for generating compressed images
US11941752B2 (en) * 2020-07-21 2024-03-26 Nvidia Corporation Streaming a compressed light field
KR20230044148A (ko) * 2020-07-31 2023-04-03 구글 엘엘씨 비제약 이미지 데이터에 강건한 뷰 합성
US11501467B2 (en) * 2020-11-03 2022-11-15 Nvidia Corporation Streaming a light field compressed utilizing lossless or lossy compression
CN112967242B (zh) * 2021-02-26 2023-07-04 北京信息科技大学 一种基于视觉特征聚合的光场质量评价方法
US11640647B2 (en) * 2021-03-03 2023-05-02 Qualcomm Incorporated Methods and apparatus for intra-wave texture looping
US11394940B1 (en) * 2021-04-16 2022-07-19 Texas Instruments Incorporated Dynamic image warping
KR102797392B1 (ko) 2021-08-03 2025-04-21 레이아 인코포레이티드 깊이 맵을 이용한 뷰 합성 시스템 및 방법
WO2024123372A1 (en) * 2022-12-09 2024-06-13 Innopeak Technology, Inc. Serialization and deserialization of layered depth images for 3d rendering
CN116800985A (zh) * 2022-03-15 2023-09-22 华为技术有限公司 编解码方法和装置
US11501410B1 (en) * 2022-03-22 2022-11-15 Illuscio, Inc. Systems and methods for dynamically rendering three-dimensional images with varying detail to emulate human vision
CN114693852B (zh) * 2022-03-28 2025-08-19 北京达佳互联信息技术有限公司 一种毛发渲染方法、装置、电子设备及存储介质
CN114492251B (zh) * 2022-04-18 2022-07-15 国家超级计算天津中心 超算环境的低速流场发散处理方法、装置、设备及介质
US12374031B2 (en) * 2022-05-18 2025-07-29 Avalon Holographics Inc. Light field offset rendering
FR3135810B1 (fr) * 2022-05-19 2024-10-11 Thales Sa Procédé de génération d’une image périphérique d’un aéronef, dispositif électronique de génération et produit programme d’ordinateur associés
CN115082609B (zh) * 2022-06-14 2025-02-14 Oppo广东移动通信有限公司 图像渲染方法、装置、存储介质及电子设备
CN115393498B (zh) * 2022-07-27 2025-09-09 浙江大学 一种基于隐式光传输函数合并的绘制方法和装置
JP7773960B2 (ja) * 2022-10-13 2025-11-20 Kddi株式会社 点群復号装置、点群復号方法及びプログラム
CN116546175B (zh) * 2023-06-01 2023-10-31 深圳创疆网络科技有限公司 基于自动感应实现投影仪的智能控制方法及装置
EP4720816A1 (en) * 2023-06-02 2026-04-08 Apple Inc. Rendering layers with different perception quality
CN116503536B (zh) * 2023-06-27 2024-04-05 深圳臻像科技有限公司 一种基于场景分层的光场渲染方法
CN116824029B (zh) * 2023-07-13 2024-03-08 北京弘视科技有限公司 全息影像阴影生成的方法、装置、电子设备和存储介质
US12501181B2 (en) * 2023-07-24 2025-12-16 Varjo Technologies Oy Complementing subsampling in stereo cameras
WO2025184103A1 (en) * 2024-02-29 2025-09-04 Dolby Laboratories Licensing Corporation Layered surface lightfields
KR20250142686A (ko) * 2024-03-22 2025-09-30 한국전자기술연구원 시각적 피로감과 어지럼을 개선한 3d 디스플레이 장치
CN118247444B (zh) * 2024-05-29 2024-08-06 腾讯科技(深圳)有限公司 一种基于三平面的处理方法及相关装置
TWI900161B (zh) * 2024-08-06 2025-10-01 大陸商北京集創北方科技股份有限公司 Led燈珠座標計算方法及顯示器校正系統
KR102805327B1 (ko) * 2024-10-10 2025-05-13 주식회사 덱스터스튜디오 멀티 레이어 이미지에 대한 고품질 업스케일링을 위한 영상 제작 방법
CN121120828A (zh) * 2025-09-04 2025-12-12 锦莱炎(成都)科技有限公司 基于智能渲染加速的显卡优化方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105451024A (zh) * 2015-12-31 2016-03-30 北京大学 一种采用压缩感知的数字全息图编码传输方法
US20160173883A1 (en) * 2014-12-16 2016-06-16 Sean J. Lawrence Multi-focus image data compression
WO2016153850A1 (en) * 2015-03-26 2016-09-29 Otoy, Inc. Relightable holograms
US20180061119A1 (en) * 2016-08-24 2018-03-01 Google Inc. Quadrangulated layered depth images
US10089796B1 (en) * 2017-11-01 2018-10-02 Google Llc High quality layered depth image texture rasterization
WO2019036794A1 (en) * 2017-08-23 2019-02-28 Matthew Hamilton SYSTEM AND METHODS FOR LAYERED SCENE DECOMPOSITION CODEC

Family Cites Families (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4495718A (en) * 1983-05-27 1985-01-29 Benjamin Margalit Photographic display device
US6366370B1 (en) * 1998-12-30 2002-04-02 Zebra Imaging, Inc. Rendering methods for full parallax autostereoscopic displays
US6549308B1 (en) 2000-01-11 2003-04-15 Zebra Imaging, Inc. Unibiased light field models for rendering and holography
US6900904B1 (en) 2000-02-04 2005-05-31 Zebra Imaging, Inc. Distributed system for producing holographic stereograms on-demand from various types of source material
SE0203908D0 (sv) * 2002-12-30 2002-12-30 Abb Research Ltd An augmented reality system and method
US8166128B1 (en) * 2003-02-28 2012-04-24 Oracle America, Inc. Systems and methods for dynamically updating a virtual volume in a storage virtualization environment
US20050185711A1 (en) 2004-02-20 2005-08-25 Hanspeter Pfister 3D television system and method
EP1894412A1 (en) * 2005-02-18 2008-03-05 THOMSON Licensing Method for deriving coding information for high resolution images from low resoluton images and coding and decoding devices implementing said method
US8289370B2 (en) * 2005-07-20 2012-10-16 Vidyo, Inc. System and method for scalable and low-delay videoconferencing using scalable video coding
US20160360155A1 (en) * 2005-09-07 2016-12-08 Vidyo, Inc. System and method for scalable and low-delay videoconferencing using scalable video coding
GB0520829D0 (en) * 2005-10-13 2005-11-23 Univ Cambridge Tech Image processing methods and apparatus
US7836356B2 (en) * 2008-04-28 2010-11-16 International Business Machines Corporation Method for monitoring dependent metric streams for anomalies
US8253730B1 (en) * 2008-08-29 2012-08-28 Adobe Systems Incorporated System and method for construction of data structures for ray tracing using bounding hierarchies
CN102165496B (zh) * 2008-09-25 2014-08-13 皇家飞利浦电子股份有限公司 三维图像数据处理
TWI542190B (zh) * 2008-11-04 2016-07-11 皇家飛利浦電子股份有限公司 編碼三維影像信號的方法及系統、經編碼之三維影像信號、解碼三維影像信號的方法及系統
US8908058B2 (en) * 2009-04-18 2014-12-09 Lytro, Inc. Storage and transmission of pictures including multiple frames
US20120044322A1 (en) * 2009-05-01 2012-02-23 Dong Tian 3d video coding formats
US8948271B2 (en) * 2011-01-13 2015-02-03 Texas Instruments Incorporated Method and apparatus for a low complexity transform unit partitioning structure for HEVC
US8553769B2 (en) * 2011-01-19 2013-10-08 Blackberry Limited Method and device for improved multi-layer data compression
US20120188226A1 (en) 2011-01-21 2012-07-26 Bu Lin-Kai Method and system for displaying stereoscopic images
US9584805B2 (en) * 2012-06-08 2017-02-28 Qualcomm Incorporated Prediction mode information downsampling in enhanced layer coding
US9860522B2 (en) 2012-08-04 2018-01-02 Paul Lapstun Head-mounted light field display
CN103856777A (zh) * 2012-12-04 2014-06-11 中山大学深圳研究院 一种基于光场渲染的视频编解码方法
US9456141B2 (en) * 2013-02-22 2016-09-27 Lytro, Inc. Light-field based autofocus
US9405124B2 (en) * 2013-04-09 2016-08-02 Massachusetts Institute Of Technology Methods and apparatus for light field projection
US9412172B2 (en) * 2013-05-06 2016-08-09 Disney Enterprises, Inc. Sparse light field representation
US9786062B2 (en) * 2013-05-06 2017-10-10 Disney Enterprises, Inc. Scene reconstruction from high spatio-angular resolution light fields
US9639773B2 (en) * 2013-11-26 2017-05-02 Disney Enterprises, Inc. Predicting a light probe for an outdoor image
US10244223B2 (en) * 2014-01-10 2019-03-26 Ostendo Technologies, Inc. Methods for full parallax compressed light field 3D imaging systems
AU2015230086A1 (en) * 2014-03-14 2016-09-29 Samsung Electronics Co., Ltd. Multi-layer video encoding method and multi-layer video decoding method using depth block
KR102224718B1 (ko) 2014-08-06 2021-03-08 삼성전자주식회사 홀로그램 생성 방법 및 장치
US10235338B2 (en) 2014-09-04 2019-03-19 Nvidia Corporation Short stack traversal of tree data structures
US10242485B2 (en) * 2014-09-04 2019-03-26 Nvidia Corporation Beam tracing
WO2016038240A1 (en) * 2014-09-09 2016-03-17 Nokia Technologies Oy Stereo image recording and playback
JP6217608B2 (ja) 2014-11-27 2017-10-25 株式会社デンソー 磁気検出装置、および、これを用いたトルクセンサ
US10546424B2 (en) * 2015-04-15 2020-01-28 Google Llc Layered content delivery for virtual and augmented reality experiences
EP3286737A1 (en) 2015-04-23 2018-02-28 Ostendo Technologies, Inc. Methods for full parallax compressed light field synthesis utilizing depth information
US9852537B2 (en) * 2015-05-01 2017-12-26 Otoy Inc. Rendering via ray-depth field intersection
US20170034530A1 (en) * 2015-07-28 2017-02-02 Microsoft Technology Licensing, Llc Reduced size inverse transform for decoding and encoding
EP3142366A1 (en) * 2015-09-14 2017-03-15 Thomson Licensing Method and apparatus for encoding and decoding a light field based image, and corresponding computer program product
US10134179B2 (en) * 2015-09-30 2018-11-20 Visual Music Systems, Inc. Visual music synthesizer
US10448030B2 (en) * 2015-11-16 2019-10-15 Ostendo Technologies, Inc. Content adaptive light field compression
EP3171598A1 (en) * 2015-11-19 2017-05-24 Thomson Licensing Methods and devices for encoding and decoding a matrix of views obtained from light-field data, corresponding computer program and non-transitory program storage device
EP3185214B1 (en) 2015-12-22 2025-05-14 Dassault Systèmes Streaming of hybrid geometry and image based 3d objects
US10429639B2 (en) 2016-01-31 2019-10-01 Paul Lapstun Head-mounted light field display
US9942548B2 (en) * 2016-02-16 2018-04-10 Google Llc Entropy coding transform partitioning information
US10026220B2 (en) 2016-05-18 2018-07-17 Siemens Healthcare Gmbh Layered lightfields for occlusion handling
EP3273686A1 (en) * 2016-07-21 2018-01-24 Thomson Licensing A method for generating layered depth data of a scene
US10661342B2 (en) * 2016-09-29 2020-05-26 Nlight, Inc. Additive manufacturing systems and methods for the same
KR20180039323A (ko) * 2016-10-10 2018-04-18 디지털인사이트 주식회사 다양한 블록 분할 구조를 결합하여 사용하는 비디오 코딩 방법 및 장치
KR20180045530A (ko) * 2016-10-26 2018-05-04 디지털인사이트 주식회사 임의의 블록 분할을 사용하는 비디오 코딩 방법 및 장치
US10389994B2 (en) * 2016-11-28 2019-08-20 Sony Corporation Decoder-centric UV codec for free-viewpoint video streaming
FR3062011B1 (fr) 2017-01-17 2020-01-10 Stmicroelectronics (Grenoble 2) Sas Procede et dispositif d'encodage dynamique controle d'un signal numerique multidimensionnel, en particulier un signal d'image et procede et dispositif correspondant de decodage
US10893262B2 (en) * 2017-02-07 2021-01-12 Siemens Healthcare Gmbh Lightfield rendering based on depths from physically-based volume rendering
US20180262758A1 (en) * 2017-03-08 2018-09-13 Ostendo Technologies, Inc. Compression Methods and Systems for Near-Eye Displays
US10475165B2 (en) * 2017-04-06 2019-11-12 Disney Enterprises, Inc. Kernel-predicting convolutional neural networks for denoising
US20180310907A1 (en) * 2017-05-01 2018-11-01 EchoPixel, Inc. Simulated Fluoroscopy Images with 3D Context
US11051039B2 (en) * 2017-06-02 2021-06-29 Ostendo Technologies, Inc. Methods for full parallax light field compression
CN119135920A (zh) 2018-05-02 2024-12-13 奎蒂安特有限公司 用于处理具有几乎无限细节的场景的编解码器
CN112840649A (zh) * 2018-09-21 2021-05-25 Lg电子株式会社 图像编码系统中通过使用块分割对图像解码的方法及其装置
US10924727B2 (en) * 2018-10-10 2021-02-16 Avalon Holographics Inc. High-performance light field display simulator
US11284091B2 (en) * 2019-03-25 2022-03-22 Apple Inc. Video based point cloud compression-patch alignment and size determination in bounding box
EP3918805B1 (en) * 2019-02-11 2025-01-08 Beijing Dajia Internet Information Technology Co., Ltd. Methods and devices for intra sub-partition coding mode
US11363249B2 (en) * 2019-02-22 2022-06-14 Avalon Holographics Inc. Layered scene decomposition CODEC with transparency
US11809161B2 (en) * 2020-07-13 2023-11-07 Lawrence Livermore National Security, Llc Computed axial lithography optimization system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160173883A1 (en) * 2014-12-16 2016-06-16 Sean J. Lawrence Multi-focus image data compression
WO2016153850A1 (en) * 2015-03-26 2016-09-29 Otoy, Inc. Relightable holograms
CN105451024A (zh) * 2015-12-31 2016-03-30 北京大学 一种采用压缩感知的数字全息图编码传输方法
US20180061119A1 (en) * 2016-08-24 2018-03-01 Google Inc. Quadrangulated layered depth images
WO2019036794A1 (en) * 2017-08-23 2019-02-28 Matthew Hamilton SYSTEM AND METHODS FOR LAYERED SCENE DECOMPOSITION CODEC
US10089796B1 (en) * 2017-11-01 2018-10-02 Google Llc High quality layered depth image texture rasterization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GILLIAM, C. ET AL.: "Adaptive plenoptic sampling", 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 11 September 2011 (2011-09-11), pages 2581 - 2584, XP032080198, Retrieved from the Internet <URL:https://ieeexplore.ieee.ore/document/6116192> [retrieved on 20200414], DOI: 10.1109/ICIP.2011.6116192 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220189137A1 (en) * 2020-12-10 2022-06-16 Canon Kabushiki Kaisha Apparatus, method, and storage medium
JP2022092235A (ja) * 2020-12-10 2022-06-22 キヤノン株式会社 情報処理装置、情報処理方法及びプログラム
US12165381B2 (en) * 2020-12-10 2024-12-10 Canon Kabushiki Kaisha Apparatus, method, and storage medium
JP7614815B2 (ja) 2020-12-10 2025-01-16 キヤノン株式会社 情報処理装置、情報処理方法及びプログラム

Also Published As

Publication number Publication date
CN113748682B (zh) 2025-10-21
US20220400242A1 (en) 2022-12-15
US12261989B2 (en) 2025-03-25
JP7387193B2 (ja) 2023-11-28
US20210152807A1 (en) 2021-05-20
KR20210126079A (ko) 2021-10-19
US20220109817A1 (en) 2022-04-07
KR102602719B1 (ko) 2023-11-16
US10911735B2 (en) 2021-02-02
US20220217319A1 (en) 2022-07-07
US20210203903A1 (en) 2021-07-01
US11876950B2 (en) 2024-01-16
US20200275074A1 (en) 2020-08-27
CA3127545C (en) 2024-05-21
US11818327B2 (en) 2023-11-14
US11876949B2 (en) 2024-01-16
JP2022523335A (ja) 2022-04-22
US20200275076A1 (en) 2020-08-27
US11743443B2 (en) 2023-08-29
US11457197B2 (en) 2022-09-27
US20200273188A1 (en) 2020-08-27
US11363249B2 (en) 2022-06-14
US20200275075A1 (en) 2020-08-27
US20220232196A1 (en) 2022-07-21
CA3127545A1 (en) 2020-09-17
US11330244B2 (en) 2022-05-10
US10986326B2 (en) 2021-04-20
US20240040103A1 (en) 2024-02-01
US20200275073A1 (en) 2020-08-27
US20250193360A1 (en) 2025-06-12
CN121509648A (zh) 2026-02-10
CN113748682A (zh) 2021-12-03
US11252392B2 (en) 2022-02-15

Similar Documents

Publication Publication Date Title
US11743443B2 (en) Layered scene decomposition CODEC with layered depth imaging
US11968372B2 (en) Layered scene decomposition CODEC method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20769199

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3127545

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2021544756

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20217028928

Country of ref document: KR

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 20769199

Country of ref document: EP

Kind code of ref document: A1

WWG Wipo information: grant in national office

Ref document number: 202080016205.2

Country of ref document: CN