CN116648903A - Processing of extended dimension light field images - Google Patents

Processing of extended dimension light field images Download PDF

Info

Publication number
CN116648903A
CN116648903A CN202180088332.8A CN202180088332A CN116648903A CN 116648903 A CN116648903 A CN 116648903A CN 202180088332 A CN202180088332 A CN 202180088332A CN 116648903 A CN116648903 A CN 116648903A
Authority
CN
China
Prior art keywords
image
metadata
viewpoint
view
angular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180088332.8A
Other languages
Chinese (zh)
Inventor
R·阿特金斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Priority claimed from PCT/US2021/061683 external-priority patent/WO2022120104A2/en
Publication of CN116648903A publication Critical patent/CN116648903A/en
Pending legal-status Critical Current

Links

Abstract

In one embodiment, the method, medium, and system process and display a light field image using a view function based on pixel locations in the image and based on the distance of the viewer from the display (the Z-position of the viewer). The view function may be an angular view function that specifies different angular views of different pixels in the light field image based on inputs that may include: the x or y pixel position in the image, the distance of the viewer from the display, and the angle of the viewer relative to the display. In one embodiment, light field metadata, such as angular range metadata and/or angular offset metadata, may be used to process and display images. In one embodiment, the color volume mapping metadata may be used to adjust the color volume mapping based on the determined angular view; and the color volume mapping metadata may also be adjusted based on the angular offset metadata.

Description

Processing of extended dimension light field images
Cross Reference to Related Applications
The present application claims priority from U.S. provisional application No. 63/121,372 and european patent application No. 20211870.9, both filed on 12/4/2020, each of which is incorporated herein by reference in its entirety.
Technical Field
The present disclosure relates to the field of image processing, and in particular to image processing of light field images.
Background
A recent development in the field of image processing and image display is called light field processing, which has the ability to display images at different viewpoints in both the horizontal and vertical directions from previously rendered volumetric content. These different directions are angles different from the classical "straight-through" viewing position in which the line between the viewer and the display is perpendicular to the surface of the display. This type of image processing and display is now referred to as 4D light field imaging, because imaging can be described as a function of four values: pixel locations (e.g., x, y) in the previously rendered image, as well as horizontal and vertical angles of the viewpoint. Further background information about 4D light field images is provided in the following articles: light Field Image Processing An review, by Gaochang Wu et al IEEE Journal of Selected Topics in Signal Processing, vol 11,No.7,October 2017,pages 926-954[ light field image processing: summary, gaochang Wu et al, IEEE signal processing monograph journal, volume 11, 7, month 10 in 2017, pages 926 to 954 ]; and Light Field Rendering, by Marc Levoy and Pat Hanrahan, in Proc.of the 23dAnnual Conf.on Computer Graphics and Interactive Techniques,1996,pages 31-42[ light field rendering, arc Levoy and Pat Hanrahan, 23 rd computer graphics and interaction technical annual meeting discussion, 1996, pages 31 to 42 ].
Disclosure of Invention
The present disclosure describes methods and apparatus for mapping light fields, such as 4D light fields. In one embodiment, the method, medium, and system process and display a light field image using a view function based on pixel locations in the image and the viewer's location from the display (the viewer's distance Z). The view function may be an angular view function that specifies different angular views of different pixels in the light field image based on inputs that may include: the x or y pixel position in the image, the distance of the viewer from the display, and the position of the viewer relative to the display. In one embodiment, light field metadata, such as angular range metadata and/or angular offset metadata, may be used to enhance the processing and display of images. In one embodiment, the color volume mapping metadata may be used to adjust the color volume mapping based on the determined angular view; and the color volume mapping metadata may also be adjusted based on the angular offset metadata.
In one embodiment, a method may include the operations of: receiving image data represented in a light field format, the image data comprising image data for a different view (e.g., a different reference view) for each of a plurality of pixels in an image; receiving a selection of a desired viewpoint relative to the image; and determining one or more views at each of the plurality of pixels using a view function that determines the one or more views based on spatial coordinates of each of the plurality of pixels, based on the desired viewpoint, and based on a distance between the desired viewpoint and the display. In one embodiment, the method may include the additional operations of: rendering an image based on the determined view; and displaying the rendered image in the determined view. The received and decoded image data in a light field format may be referred to as a baseband light field representation comprising different reference views, which may be used to construct additional views based on these different reference views. The baseband light field image format may be represented as a) a decoded planar format as tiles, each tile being one of the possible views, or b) an interleaved format.
In one embodiment, the baseband light field image is a 4D light field image that has been previously rendered as volumetric content, and a selection of the desired viewpoint is received from a user to see the image at the desired viewpoint. The view function may be an angular view function comprising a horizontal angle view function and a vertical angle view function; the horizontal angle view function may have inputs including: the distance between the desired viewpoint and the display, the horizontal spatial coordinates of the pixels, and the horizontal component of the desired viewpoint; the vertical angle view function may have inputs including: the distance between the desired viewpoint and the display, the vertical spatial coordinates of the pixels, and the vertical component of the desired viewpoint.
In one embodiment, the view function is defined with respect to a reference plane at a reference distance from the display such that the view function will determine the same view for all pixels in the image for any one viewpoint in the reference plane. For points of view outside the reference plane, the view function may determine different views for different pixels in the image. In one embodiment, the desired viewpoint is selected based on the estimated viewer position.
In one embodiment, a method may further comprise the additional operations of: receiving color volume mapping metadata; and applying a color volume map based on the determined view and color volume map metadata. In one embodiment, the color volume mapping metadata is adjusted based on the desired viewpoint and angular offset metadata. In one embodiment, the diagonal offset metadata may be interpolated based on the desired viewpoint. In one embodiment, the color volume mapping metadata may vary from scene to scene or from image to image across a plurality of different images.
In one embodiment, a method may further comprise the additional operations of: the determined view is interpolated from a set of nearest available reference views in the image data at the desired viewpoint. In one embodiment, the interpolation may use bilinear interpolation from dense light field images that include many reference views.
In one embodiment, a method may further comprise the additional operations of: the view, which may be a desired view, is limited to the effective viewing zone. In one embodiment, the limit may include one of the following: (a) Hard clamping an invalid view to a view in the valid viewing zone, or (b) soft clamping an invalid view to a view in the valid viewing zone. In one embodiment, a method may further include an additional operation of receiving metadata including an angular range for determining an effective viewing zone.
Aspects and embodiments described herein may include a non-transitory machine readable medium that may store executable computer program instructions that, when executed, cause one or more data processing systems to perform the methods described herein when the computer program instructions are executed. The instructions may be stored in a non-transitory machine-readable medium, such as Dynamic Random Access Memory (DRAM), which is volatile memory, or non-volatile memory such as flash memory, or other forms of memory. Aspects and embodiments described herein may also take the form of a data processing system constructed or programmed to perform these methods. For example, a data processing system may be constructed with hardware logic for performing the methods, or may be programmed with a computer program for performing the methods.
Aspects and embodiments described herein may also include computer products and computer media such as: a computer program product comprising instructions which, when executed by a computer, cause the computer to perform any of the methods described in this disclosure, including the exemplary embodiments such as exemplary embodiments 1 to 15 below; and
a computer-readable storage medium comprising instructions that, when executed by a computer, cause the computer to perform any of the methods described in the present disclosure, including exemplary embodiments such as exemplary embodiments 1-15, below.
The above summary does not include an exhaustive list of all embodiments and aspects of the present disclosure. All systems, media, and methods may be practiced according to all suitable combinations of the aspects and embodiments summarized above and disclosed in the detailed description below.
Drawings
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
Fig. 1 shows an example of three different views from three different viewing positions or viewpoints of an image, such as a 4D light field image.
FIG. 2 shows an example of how a 4D light field image may be stored as a set of tiles (referred to as a decoded planar format); each tile is one of the possible views of the image and each view corresponds to a particular viewpoint or viewing position.
Fig. 3A shows an example of a viewing zone and a reference viewing position for a Full High Definition (FHD) image (with a pixel resolution of 1920 by 1080).
Figure 3B shows another example of a viewing zone and a rightmost viewing position of an FHD image.
Figure 3C shows another example of the viewing zone and the nearest point of view of the FHD image.
Fig. 4A, 4B, 4C and 4D illustrate examples of different viewing zones for different angular ranges of the light field image.
Fig. 5A shows an example of how an invalid viewpoint or viewing position is converted into an valid viewpoint of a light field image.
Fig. 5B shows another example of how an invalid viewpoint or viewing position is converted into an valid viewpoint of a light field image.
Fig. 5C shows an example of a soft clamp function that may be used to convert an invalid view into an valid view.
Fig. 5D shows an example of converting an invalid view into an valid view using a soft clamp function.
Fig. 6A shows a flow chart illustrating a method according to one embodiment.
Fig. 6B depicts a flow chart illustrating a method according to another embodiment.
FIG. 7 is a block diagram illustrating an example of a data processing system that may be used to implement one or more embodiments described herein.
Detailed Description
Various embodiments and aspects will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative and should not be construed as limiting. Numerous specific details are described to provide a thorough understanding of the various embodiments. However, in some instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments.
Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment. The processes depicted in the figures below are performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software, or a combination of both. Although the processes are described below in terms of some sequential operations, it should be appreciated that some of the operations described may be performed in a different order. Also, some operations may be performed in parallel rather than sequentially.
The present disclosure describes methods, non-transitory machine-readable media, and data processing systems that can map a light field (e.g., a 4D light field containing reference views) to different viewpoints of an image. The present disclosure begins with an overview specific to a 4D light field, and then describes a process for mapping such a light field for a particular viewpoint. Further aspects of using interpolation and using metadata are then described. It will be understood that the embodiments described herein may be combined in various different combinations intended to be covered by the appended claims.
In one embodiment, the light field (e.g., a 4D light field) may be a complete representation of the volumetric scene behind a planar surface (e.g., a planar surface of a display screen of a display device). The display device may display images of the scene at different viewpoints such that the viewers are presented with different images according to their viewing positions or viewpoints. The image may be pre-rendered volumetric content stored in a 4D light field format. FIG. 1 illustrates an example of a system 10 that displays different images at different viewpoints on a display 12, according to one embodiment. When the viewer is in a central position, display 12 shows image 14, which is a conventional image that is presented today in systems that do not use a light field. When the viewer moves to the left side of the display, the system displays an image 18 showing the appearance of the scene as seen from this point of view on the left side of the display 12. When the viewer moves to the right side of the display 12, the display 12 presents an image 16 that shows the appearance of the scene as seen from this point of view on the right side of the display 12.
The 4D light field may allow a viewer to look around within the scene, showing slightly different perspectives from each viewing position, somewhat like looking through a window to view a real scene. In one embodiment, slightly different perspectives may show different images, which may include specular reflected highlights such as snow, water, metal, skin, eye, etc. that are only revealed in one view and not in an adjacent view. In one embodiment, the light field may include shadows within the viewing region, which appear as the viewer moves slightly. These obscurations may be at the boundaries of the window or within the scene itself, with some objects being partially obscured by nearer objects. In one embodiment, the light field may support both optically captured content as well as rendered graphics that have been rendered by a computer or other data processing system. The light field may actually allow a viewer to move or walk around the scene to see different views of the scene by moving to different viewpoints that are active in the scene. For example, in the case of fig. 1, the user may be able to walk around at least a portion of the vehicle to see the front of the vehicle and the right side and left side of the vehicle, or at least a portion of the right side and a portion of the left side of the vehicle, by changing the point of view of the viewer.
The light field may be considered to have four dimensions including pixel position (e.g., x, y) and angular information (u, v); in addition, for each pixel location, there will be color information representing the color of the pixel at each possible view. Each pixel location may have multiple views (e.g., and thus multiple color values, one for each view) selected based on the angle information; in other words, the color information selected at the pixel location depends on angle information derived from the viewpoint selected by the user or determined by the system (e.g., the system estimates the location of the user). The first viewpoint will cause selection of a first view, which causes selection of first color information (corresponding to the first view) at a particular pixel, and the second viewpoint will cause selection of a second view, which selects second color information (corresponding to the second view) at the same pixel. A conventional image may be considered to have three dimensions represented by pixel locations (e.g., x, y) and color information "c" (thus, the conventional image may be represented by the symbol Im (x, y, c)). The additional information of the 4D light field is angular information (e.g., u, v) which is derived from a selected viewpoint of the viewing scene, and thus, the 4D light field image may be represented by a symbol Im (x, y, u, v, c).
The light field may have an angular resolution that may be defined as a number of views (also referred to as a number of angular views in some embodiments). In the example shown in fig. 2, the horizontal angular resolution is 5 (5 different views) and the vertical angular resolution is 5 (5 different views). The light field may also be defined as having an angular range (e.g., in degrees), which may be defined as the maximum angle in both vertical and horizontal. In one embodiment, there is a specified angular range supported by the content, and the specified angular range may be determined during capture, creation, or rendering of the content and may be stored and transmitted as angular range metadata associated with the content, where the angular range may be in degrees. In one embodiment, there may be two angular range values, one for the maximum horizontal angle at which the light field can be accurately viewed and the other for the maximum vertical angle at which the light field can be accurately viewed. When the angular range is zero for both horizontal and vertical, there is no alternative view in either direction, and content can only be viewed correctly from the position described herein as the reference viewing position, and a single view from that viewing position (reference viewing position) is identical to the existing content of a conventional 2D system and identical to the image 14 shown in fig. 1. Another term that may be used to describe angle information is angular density, which may be expressed as a ratio of (angular resolution)/(angular range); the angular resolution may be in units of views per degree.
The 4D light field image can be conceptualized in two ways: planar light field images (shown in fig. 2) and interlaced light field images. The received and decoded 4D light field image may be referred to as a baseband light field image containing reference views, which may be used to create further views based on the reference views. In a planar light field image (a 4D light field image as shown in fig. 2), each tile or plane is an image corresponding to a different reference viewing position or viewpoint, and each of these tiles or planes is one of the reference views. The planar 4D light field may be well suited for spatial operations, such as compression and resizing, using existing image processing architectures described below. In the example shown in fig. 2, the angular resolution is 5 views by 5 views. The center tile 14 (e.g., at u=2, v=2) is a conventional 2D image corresponding to a reference viewing position described below today. The angular range of this example in fig. 2 is 30 ° in both the horizontal and vertical directions. The leftmost tile along the center tile row (v=2) corresponds to image 18 in fig. 1 at the reference viewing distance described below, and the rightmost tile along the center tile row corresponds to image 16 in fig. 1 at the reference viewing distance described below. The other tiles in fig. 2 correspond to different viewing positions of different horizontal angles 20 and vertical angles 22 at the reference viewing distance described below. For example, the tile in the upper left corner shown in fig. 2 corresponds to a horizontal view at the leftmost viewpoint and a vertical view at the highest viewpoint in the vertical direction; the upper right-hand tile shown in fig. 2 corresponds to a horizontal view at the rightmost viewpoint and a vertical view at the highest viewpoint in the vertical direction. The lower left tile shown in fig. 2 corresponds to a horizontal view at the leftmost viewpoint and a vertical view at the lowest viewpoint in the vertical direction; the lower right-hand tile and the rightmost tile shown in FIG. 2 Corresponds to a horizontal view at the viewpoint of (c) and a vertical view at the lowest viewpoint in the vertical direction. The horizontal view for the pixels along the top row of tiles is from left to right in this representation of fig. 2 (a) for the first tile (upper left corner tile): x is x 0 u 0 、x 1 u 0 、x 2 u 0 、x 3 u 0 …; (b) for the next tile (to the right of the first tile): x is x 0 u 1 、x 1 u 1 、x 2 u 1 …. The vertical view for the pixels along the top row of tiles is from left to right in this representation of fig. 2 (a) for the first tile (upper left corner tile): y is 0 u 0 、y 1 u 0 、y 2 u 0 、y 3 u 0 …; (b) for the next tile (to the right of the first tile): y is 0 u 1 、y 1 u 1 、y 2 u 1 …. Thus, at the reference viewing distance described below, all pixels in the top left tile will have the view shown in the top left tile in FIG. 2, and all pixels in the next tile (along the top row of tiles) will have the view shown in that next tile in FIG. 2.
The 4D light field may also be represented as an interlaced image. In this representation, the angular views are staggered horizontally and vertically so that adjacent pixels correspond to different view directions. Such a representation may facilitate angular processing, such as interpolation between viewpoints, because only a small connected region of the 4D light field image is needed to select or interpolate a new viewpoint (discussed later). The view of pixels along the top row produces X from left to right in this representation 0 U 0 、X 0 U 1 、X 0 U 2 、X 0 U 3 、X 0 U 4 、X 1 U 0 、X 1 U 1 、…。
The planar 4D light field shares some similarity with the planar color representation of the image, where each dimension of color is represented as a complete image (e.g., R, R, R, G, G, G, B, B, B). Alternatively, the color images may also be represented in an interlaced fashion (e.g., R, G, B, R, G, B, R, G, B). Likewise, an interleaved 4D light field represents each view interleaved with other views. Lossless and efficient conversion between planar and interleaved formats can be achieved by indexing into the appropriate portion of system memory that is storing the light field images.
In this disclosure, many illustrations and methods focus on only the horizontal axis (x, u) for brevity. It will be appreciated that the same method is also applicable to the vertical axis (y, v), even if not explicitly indicated.
Viewing zone and reference viewing position
A viewing zone may be defined as an area in which an effective viewpoint may be rendered. In one embodiment, the viewing region may be used to constrain views, as described further below. The effective viewing zone is a viewing zone of the image that is constrained and defined by angular range metadata of the image, and the angular range metadata (described further below) specifies the angular range through which the image can be viewed. An effective viewpoint is any viewpoint in the effective viewing zone; an invalid view is a view outside the valid viewing zone. The viewing area may be divided into two areas 59 and 61 shown in fig. 3A, 3B and 3C, which illustrate the top view 50 in the horizontal direction of the display 51, and the viewing area including the areas 59 and 61. In the example shown in fig. 3A, 3B, and 3C, the effective viewing zone is constrained to regions 59 and 61. Region 61 shows a viewing zone in which the 4D light field can be accurately viewed with normal or beyond the normal observer's vision, and this region 61 exceeds the reference viewing distance defined by reference viewing plane 56; in other words, region 61 has a viewer/observer distance from display 51 that is greater than the distance between reference viewing plane 56 and display 51 (shown on y-axis 53). Region 59 shows a viewing zone in which the 4D light field can still be accurately viewed but the viewer may observe individual pixels due to limited spatial resolution (since the viewer is closer to the display 51 than the reference viewing plane 56). The viewing zone is determined by the angular range and spatial resolution explained in more detail below. The reference viewing plane 56 separates the regions 59 and 61 and includes a reference viewpoint position 57 at the center of the reference viewing plane 56. The position along the x-axis 55 may be the position of the viewer (in the horizontal direction). It will be appreciated that the representation of the viewing zone in the vertical direction will be similar to the representations shown in fig. 3A, 3B and 3C.
The reference viewpoint position 57 at the center of the reference viewing plane 56 may be defined as being at the center of the screen (horizontally and vertically) and at a reference distance z0 where the spatial resolution of the screen is 60.8 pixels per degree. For Full High Definition (FHD) resolution images (e.g., 1920 x 1080 images), this reference distance z0 is 3.2 picture height (thus the distance between the reference plane 56 and the display 51 is 3.2 picture height); as shown in fig. 3A, 3B, etc., the reference plane and the surface plane of the display are parallel to each other. This reference distance z0 separates viewing zones 59 and 61 and reference viewing plane 56 is located at a distance z0 from display 51. At larger distances, vision is less than screen resolution, and images can be presented with high visual fidelity. At a closer distance, the vision is greater than the screen resolution, and individual pixels comprising the image may be visible. The vertical spatial resolution in the vertical dimension (i.e., y=1080 or 2160 or 4320, for example) of a plane can be calculated by the equation: z0=0.5/tan (Y/(2×60.8)) to calculate the reference viewing distance (z 0 in screen height).
As also shown in fig. 3A, there are three angles positioned horizontally across the display at the leftmost edge, center, and rightmost edge. These figures illustrate the angle formed by the screen normal and the viewing position. These angles are calculated from pixel positions (x, y, in screen height), observer positions (ObX, obY, in screen height), and reference viewing distances (z 0, in screen height) using the angle (in degrees), as described below:
thetafunu=atan((ObX-x)/(z0+ObZ));
thetafunv=atan((ObY-y)/(z0+ObZ))。
At the reference viewing position, these angles (shown as angle 63 alongside display 15) are-15.5 degrees, 0 degrees, and 15.5 degrees, respectively, assuming that the image has an aspect ratio of 16:9. Three horizontal view 65 indicators are also shown, indicating which angular view is presented to the viewer. The angular view is calculated by the horizontal view function u=ufun (x, obX, obZ) and the vertical view function v=vfun (y, obY, obZ), where (u, v) is the angular view, (x, y) is the spatial coordinates of each pixel in the image, and ObX, obY, obZ is the position of the observer relative to the reference position (where these positions are all zero). The angular view function is described in more detail later. In one embodiment, at the reference viewing position 57, the horizontal view function ufun is defined such that ufun (x, 0) =0, or in other words, for all pixels, the horizontal angular view=0 at the reference position, as shown in fig. 3A. In one embodiment, at the reference viewing position 57, the vertical view function vfun is defined such that vfun (y, 0) =0, or in other words, for all pixels, the vertical angle view=0 at the reference position, as shown in fig. 3A.
Viewing position at reference viewing plane and rightmost viewing position
The viewing position along the line at z0 is referred to as the viewing position in the reference viewing plane 56. As the viewer moves laterally along this line in the reference viewing plane 56 (at a constant Z distance relative to the display 51), the viewer is presented with a new view of the scene from the 4D light field. In one embodiment, when a viewpoint is on the reference viewing plane, all pixels in the image will have the same view at that viewpoint. Fig. 3B illustrates one viewpoint of a so-called rightmost viewpoint 67 located on a reference viewing plane. The viewpoint is defined as a point at an angular range of content from the center of the image.
As shown in fig. 3B, the angle between the screen normal and the rightmost viewing point at the center of the screen is 30 degrees, which in this example is the angular range. As also shown in fig. 3B, the angles between the screen normal and the rightmost viewing point are all different, ranging from 40.5 degrees for the leftmost pixel to 16.7 degrees for the rightmost pixel on the image. However, the angular view was previously defined to be the same for all positions at the viewing plane. This is done by further defining the view functions ufun (x, obX, 0) =un and vfun (y, obY, 0) =vn. In other words, the same view (un, vn) is calculated for all pixels (x, y) at any given possible observer position (ObX, obY) along the reference plane (ObZ =0) at a particular point of view in the plane 56. As shown in fig. 3B, this is satisfied because for each spatial position (x, y) in the image, the horizontal view = 2 (symbolically shown in fig. 3B by view 65 directly below display 51). The rightmost viewing position (ObXmax in terms of screen height) is determined from the reference viewing distance (z 0 in terms of screen height) and the viewing angle range (AngleRange in terms of degrees). This returns the view corresponding to the maximum angular range Umax. For example: obXmax= -z0/tan (Angle Range+90).
Recent viewing position
Fig. 3C shows an additional point of interest, called the most recent viewing position. This additional point of interest is defined as the closest point within the viewing zone. As can be seen, the angle formed between the screen normal and the nearest viewpoint is 40.5 degrees for the leftmost pixel in the image, the same for the rightmost viewpoint. For the center pixel and the rightmost pixel, the angles between the screen normal and the nearest viewpoint are zero and-40.5 degrees, respectively. These angular views corresponding to these angles are also labeled (see view 65) and are-2, 0 and 2 (-Umax, 0, umax), respectively. This means that the appropriate image of the most recent viewing position is made up of multiple views from the 4D light field. The leftmost pixel is from the rightmost view (umax=2), the center pixel is from the center view (u=0), and the rightmost pixel is from the leftmost view (-umax= -2). The middle pixel is from the middle view or interpolated as will be discussed later. Note that views from any viewing position not on the reference viewing plane are also made up of multiple views from the 4D light field in a similar manner. In other words, in one embodiment, if the viewpoint is not on the reference viewing plane 56, multiple views will be used for pixels in the image, with different pixels in different spatial locations using different views.
The nearest viewing point (ObZmin in terms of screen height) is calculated from the aspect ratio (ar), the rightmost viewing position (ObXmax in terms of screen height) and the reference viewing distance (z 0 in terms of screen height): obzmin=z0/(1+2×oxmax/ar) -z0.
Furthest viewing position
The furthest viewing position ObZmax (in screen height) is the furthest distance from which the 4D light field can be correctly perceived. In one embodiment, this is calculated from the aspect ratio (ar), the rightmost viewing position (ObXmax in terms of screen height), and the reference viewing distance (z 0 in terms of screen height):
if ObXmax < ar/2,
then obzmax=z0/(1-2 x max/ar) -z0;
otherwise
ObZmax=z0*2-z0;
Ending
For viewing zones where the rightmost distance is equal to or greater than the screen width, the furthest viewing position ObZmax is infinite or the maximum distance that can be represented in a fixed point representation.
Influence of angular Range on viewing Range
The angular range corresponds to the maximum angle from the center of the screen to the rightmost viewing position. Fig. 4A, 4B, 4C, and 4D show examples of comparing angular ranges of 0 degrees, 5 degrees, 15.5 degrees, and 30 degrees, respectively, and the rightmost viewing position of each of these examples.
In the zero degree angular range shown in fig. 4A, the viewing zone is only a single point corresponding to the reference viewpoint. This is the case for today's 2D images, where an image is produced that is viewed from a single viewing position at the correct viewing angle. At any other viewing position outside the viewing zone, the viewing angle becomes incorrect and the human visual system infers that the image is located on the 2D plane, rather than a real scene "behind" the 2D plane. When u=1 and v=1, the angular resolution must be zero and the 4D light field collapses to be the same as a conventional 2D image.
Within the 5 degree angular range shown in fig. 4B, the viewing zone (including regions 59A and 61A) slightly increases to a diamond shape. Within this viewing zone, the viewpoint may be slightly shifted and images with the correct viewing angle may be calculated and shown to mimic the experience of looking through the window. The rightmost viewpoint 71 on the reference viewing plane 56 will provide the same view (view 2) for all pixels in the image.
In the case of the angular range of 15.5 degrees shown in fig. 4C, the viewing zone matches the size of the image (the rightmost viewing position 73 corresponds to the width of the image) and can be viewed from any distance at the correct viewing angle. The viewing zone (including regions 59B and 61B) increases as the angular extent increases. In the case of the 30 degree angular range shown in fig. 4D, the viewing zone (including regions 59C and 61C) is larger than the image in most positions of the viewing zone; the rightmost viewing position 75 exceeds the rightmost pixel on the display 51.
Angular view function
The horizontal view function and the vertical view function may be used to calculate an angular view for a given viewing position: u=ufun (x, obX, obZ) and v=vfun (y, obY, obZ), where u, v is the angular view, x, y is the spatial coordinates of each pixel in the image, and ObX, obY, obZ is the position of the observer relative to the reference position (these positions are all zero).
The following constraints of the diagonal view function may be further specified to ensure that each view of the planar 4D light field image corresponds to an image having the correct perspective on the reference viewing plane. With these constraints, views of all viewpoints on the reference plane may each be constructed, and these views may be used as reference views for constructing views of viewpoints outside the reference viewing plane.
Reference viewpoint
Ufun (x, 0) =0 (u=0 for all pixels in image x)
Vfun (y, 0) =0 (v=0 for all pixels in image y)
Reference viewing plane
-ufun(x,ObX,0)=U n (u=u for all pixels in image x n )
-vfun(y,ObY,0)=U n (for all of the images y)Pixel, v=v n )
Outermost viewing point
-ufun(x,ObX max 0) =umax (u=umax for all pixels in image x)
vfun(y,ObY max 0) =vmax (v=vmax for all pixels in image y)
In one embodiment, the following view function meets the criteria described above:
ufun(x,ObX,ObZ)=(thetafunu(x,ObX,ObZ)+(thetafunu(0,(ObX-x)*z0/(z0+ObZ)+x,0)-thetafunu(x,(ObX-x)*z0/(z0+ObZ)+x,0)))*Umax/AngularRangeU;
vfun(y,ObY,ObZ)=(thetafunv(y,ObY,ObZ)+(thetafunv(0,(ObY-y)*z0/(z0+ObZ)+y,0)-thetafunv(y,(ObY-y)*z0/(z0+ObZ)+y,0)))*Vmax/AngularRangeV;
in one embodiment, this reduces to the following set of horizontal and vertical view functions:
ufun(x,ObX,ObZ)=atan((ObX-x)/(ObZ+z0)+x/z0)*Umax/AngularRangeU
vfum (y, obj) =atan ((ObY-y)/(ObZ +z0) +y/z 0) ×vmax/angullarrange v. In this example of these view functions, angullarongeu is a horizontal angular range that can be specified by the angular range metadata of the image, and angullarongev is a vertical angular range that can be specified by the angular range metadata of the image; also in this example, umax is the horizontal angular resolution of the image and Vmax is the vertical angular resolution of the image. Umax and Vmax may also be specified in the metadata of the image.
Interpolation of diagonal views and dense 4D light fields
For many viewing positions (especially for points of view outside the reference viewing plane), the angular view function may return a score value. This means that the desired view is located somewhere between adjacent views, so the correct view of such viewing position must be interpolated from the existing view (tiles or views as shown in fig. 2). The process of creating an image by interpolating between 2 or more images is known in the art; however, to perform such interpolation, it is desirable to have a light field image with high angular density, which is referred to in this disclosure as a dense 4D light field image. Dense 4D light fields are defined as having an angular density that is high enough that the differences between adjacent views are not or hardly perceptible to the eye at the reference viewing plane. This occurs when a lateral shift of the viewpoint of the same size as the single pixel size of the image (obxmin=ar/X) causes the angular view to increment closer to 1.0. This happens at the reference viewing position when:
Angular density=1/atan (ar/X, z 0)
In the case of the angular density defined above, the same lateral shift ObXmin around the rightmost viewing position may not yield an angular view increment of exactly 1.0, so the angular density may be calculated for the worst viewing position for this case preferably:
Al=atan(ObXmax/z0)
A2=atan((ObXmax-ObXmin)/z0)
angular density=1/(A2-A1)
At viewing distances greater than the reference plane, the same lateral shift ObXmin may produce an angular view increment of less than 1.0. However, this is expected because the vision of the observer is degraded at a larger viewing angle.
The simplest form of angular view interpolation is the nearest neighbor, where the Interpolated View (IV) is calculated from:
IV=4DLF(x,y,floor(ufun(x,ObX,ObZ)+0.5),floor(vfun(y,ObY,ObZ)+0.5))
in case of a dense 4D light field with a sufficiently high angular resolution, such interpolation may produce a fluent visual experience with viewpoint changes.
In one embodiment, a better (more fluent) approach is to use bilinear interpolation, where the nearest two, three or more angular views are added to each view based on linear distance. With bilinear interpolation, horizontal views may be interpolated first, followed by vertical views, or otherwise interpolated. Examples of bilinear interpolation using two views are:
IV1=4DLF(x.y,floor(ufun(x,ObX,ObZ)),floor(vfun(y,ObY,ObZ)))
IV2=4DLF(x,y,floor(ufun(x,ObX,ObZ)+1),floor(vfun(y,ObY,ObZ)))
IV3=4DLF(x,y,floor(urun(x,ObX,ObZ)),floor(vfun(y,ObY,ObZ)+1))
IV4=4DLF(x,y,floor(ufun(x,ObX,ObZ)+1),floor(vfun(y,ObY,ObZ)+1))
AlphaU=ufun(x,ObX,ObZ)-floor(ufun(x,ObX,ObZ))
AlphaV=vfun(y,ObY,ObZ)-floor(vfun(y,ObY,ObZ))
IVU1=IV1*(1-AlphaU)+IV2*AlphaU
IVU2=IV3*(1-AlphaU)+IV4*AlphaU
IV=IVU1*(1-AlphaV)+IVU2*AlphaV
In case of a dense 4D light field with a sufficiently high angular resolution, such interpolation may result in a smoother visual experience with viewpoint changes. Bilinear interpolation using three or more views may lead to more consistent sharpness between multiple views, as some level of interpolation or view combination is always applied.
In some embodiments, interleaved 4D light fields may be useful when interpolating between views, because for each spatial position x, y, the nearest angular view (u, v) is stored in an adjacent memory location, thereby making retrieval efficient. For example, when interpolating between three adjacent views, the layout of the data in memory may allow the system to read a sequence of consecutive address locations of the three views from memory to obtain the data needed to perform the interpolation.
More advanced forms of interpolation are also possible, and may be useful for non-dense (or sparse) 4D light fields in some embodiments. One such interpolation known in the art for use in interpolating frame rates is a family of motion estimation motion compensation techniques. These techniques attempt to align features between adjacent views and then create interpolated views by shifting or deforming the adjacent views. Such techniques are known in the art and are used by services such as intel trueview. These techniques may be used with the embodiments described herein to interpolate between views to obtain interpolated pixel color data.
Further improvements to interpolation may also be possible by considering a 4D light field video comprising a plurality of 4D light field images separated by a certain time t (e.g. a time based on a frame rate of 30 frames per second or other frame rate).
Constraining angular views to effective viewing range
For viewing positions outside the effective viewing range, the angular view function may return angular views that exceed the range (e.g., umax, -Umax) included in the image. In one or more embodiments, this may be addressed by constraining the observer positions ObX, obY, and ObZ to the effective viewing range prior to determining the angular view. In one embodiment, the view closest to the viewer location (within the effective viewing range) is selected as the rendered view to prevent abrupt changes in angular view as the viewer crosses the boundary between the effective viewing zone and the outside of the zone.
The angular view may be constrained as illustrated in fig. 5A and 5B. Referring now to fig. 5A, the actual viewer position 91 (ObX, obY, obZ) is outside the effective viewing zone including the regions 59D and 61D, and the actual viewer position 91 is constrained within the effective viewing zone by selecting a constraint viewpoint 93 (ObXc, obYc, obZc) and determining a view to be provided using the constraint viewpoint 93. In the case of fig. 5B, the actual viewer position 95 is outside the effective viewing zone (and in front of the most recent viewing position) and the actual viewer position 95 is constrained within the effective viewing zone by selecting a constraint viewpoint 97 (ObXc, obYc, obZc) and using the constraint viewpoint 97 to determine the view to be provided. Within the effective viewing range, the intended viewing angle is rendered, ensuring that the viewing angle shifts with the viewer position in a natural way, and that the angular view outside the viewing zone is the most recently available view within the effective viewing range. In one embodiment, this method works by finding the intersection of the line formed between the viewer position 91 (at ObXc, obYc, obZc) and the reference viewing position 57 with the line formed by the boundary of the active viewing area. This intersection is shown in fig. 5A as being at constraint viewpoint 93 and in fig. 5B as being at constraint viewpoint 97. In one embodiment, the intersection is calculated in two steps.
The first step determines the closest point on the boundary of the active viewing zone, which is located on a line between the viewer and the reference viewing position:
if ObZ >0& ObXmax < = ar/2,
then obxc=sign (ObX) ×obzmax/(ObZmax/obxmax+ ObZ/abs (ObX));
if abs (ObX) >0,
then obzc= ObZ/ObX ×obxc;
otherwise
ObZc=ObZmax;
Ending
Otherwise if ObZ >0& &oxmax > ar/2,
then obxc=2×obxmax×z0/(ObZ ×ar-2×obz×obxmax+2×abs (ObX) ×z0);
if ObXc/ObX <0,
then obxc= ObX;
ObZc=ObZ*2;
otherwise if abs (ObX) >0,
then obzc= ObZ/ObX ×obxc;
otherwise
ObZc=ObZ*2;
Ending
Otherwise if ObZ < = 0,
then obxc=sign (ObX) ×obzmin/(ObZmin/obxmax+ ObZ/abs (ObX));
if abs (ObX) >0,
then obzc= ObZ/ObX ×obxc;
otherwise
ObZc=ObZmin;
Ending
Ending
The second step limits the viewer position to within the effective viewing range and creates a constrained viewing position (also referred to as a constrained viewpoint).
ObXc=sign(ObX)*min(abs(ObX),abs(ObXc)).
ObZc=sign(ObZ)*min(abs(ObZ).abs(ObZc))。
In alternative embodiments, the viewing position may be further modified to achieve soft clamping of the boundaries of the effective viewing range. In one embodiment, this is accomplished by smoothly "compacting" the viewing position within the active viewing zone toward the reference position. As the observer moves away from the reference viewing position, the point of view changes less and less as they approach the boundary of the reference viewing position. This results in an unnatural experience compared to looking through windows, but avoids abrupt transitions at the edges of the active viewing zone and also increases the size of the zone where different viewing angles can be observed based on viewing position.
In one embodiment, the operation for applying soft clamping is:
1) Determining a boundary of the active viewing zone closer to the viewer than any other boundary of the active viewing zone, and determining a point on the boundary of the active viewing zone that is located on a line between the viewer and the reference viewing position (same as before, but without a second clamping phase)
2) The angular views in the region near the effective viewing range are compressed toward the reference position, as described below:
a. determining a distance from a reference viewing position to an observer
R0=sqrt(ObX^2+ObZ^2);
b. Determining a distance from a reference viewing position to an edge of an active viewing zone
RV=sqrt(ObXc^2+ObZc^2);
c. Determining a ratio of an observer to an edge of the active viewing area
R0=R0/RV;
d. Soft map cutoffs c1 and c2 are defined. These are the relative distances between the observer and the effective viewing range (less than c 1), the compression regions (c 1 to c 2) and at the boundary of the effective viewing zone (beyond c 2) where the mapping is linear, where the mapping is a linear compression region and at the boundary of the effective viewing zone. These may be defined in a configuration file or alternatively transmitted via metadata of a particular piece of content.
c1=0.5;
c2=2;
e. Coefficients of the cubic spline compression region are calculated. These coefficients are calculated such that the slope of the function is 1 at c1 and 0 at c2.
d1=(c1+c2-2)/(c2-c1)^3;
d2=-(c1+2*c2-3)/(c2-c1)^2;
Using soft clamping
If R0 is >0,
r=r0;
if R0> =c2,
r=1;
otherwise if R0> c1,
then R= (d 1 + (R-c 1)/(3) +d2 + (R-c 1)/(2+ (R-c 1) +c1);
ending
ObXc=ObX*R/R0;
ObZc=ObZ*R/R0;
Otherwise
ObXc=0;
ObZc=0;
Ending
The function described above uses cubic splines as soft compression functions, as illustrated in fig. 5C. Other functions having similar shapes may alternatively be used. The black dashed line in fig. 5C represents the clamping method described above (without soft clamping). Curve 103 shows the soft clamp function. Points 107 and 108 show the c1 and c2 positions, respectively, and point 105 shows how viewer positions outside the effective viewing range are mapped to different viewpoints within the effective viewing range. Fig. 5D shows an example of clamping the viewpoint 115 outside the effective viewing range to the viewpoint 117 within the effective viewing range using soft clamping; note that the viewpoint 117 is not at the edge of the effective viewing range, but is displaced a short distance from the edge.
Color volume mapping
Once the correct angular view has been obtained, it may also be desirable to perform color volume mapping for the selected angular view to map the dynamic range of the light field image into the capabilities of the display. This may be done using techniques used in Dolby Vision (Dolby Vision), such as the color volume mapping process. This process, as described in various prior dolby patents (see, e.g., U.S. patent No. 10,600,166), may use metadata to guide the mapping process. This metadata may be based on an image or a set of images, e.g., a set of images sharing the same overall color balance and color range and dynamic range in a scene. The metadata may vary from scene to scene or even from image to image, depending on the level of control desired to render the image as accurately as possible. Different scenes may have different amounts of dynamic range and different color balances and color ranges, and thus metadata may change based on these different scenes. Similarly, different views in the same light field image may also have different amounts of dynamic range and different color balances and color ranges. In one embodiment, a Color Volume Map (CVM) metadata set may be provided for the light field image, and this CVM metadata may be adjusted based on the selected view, rather than providing separate CVM metadata for each of the different possible views.
When applying color volume mapping to views rendered from 4D light field images, embodiments may use a process similar to the dolby view process mentioned above. In one embodiment, additional metadata fields that allow for adjustment of the mapping based on viewing position may be included in the 4D light field. For example, from a reference viewing position, the image shown on the display may be dark. However, as the viewer moves to the rightmost viewing position, bright objects such as windows or light sources may appear, changing the characteristics of the image and thus the optimal color volume map. This works in a similar way to human vision, if you look towards the window, the image on the retina will be brighter, allowing the vision system to adjust the exposure (also called adaptation).
In one embodiment, metadata (e.g., CVM metadata) included in the 4D light field may be adjusted based on the angular view function by:
1) Loading metadata corresponding to a reference viewing position
2) Angular offset metadata corresponding to at least one additional viewing position is loaded. In an example, this includes nine offsets of offset metadata related to the average luminance of frames corresponding to the extreme viewing angles (u= -Umax, 0, umax, and v= -Vmax, 0, vmax). For example, the angular offset metadata may have a value indicating that the average image brightness is brighter than 0.1 from the rightmost viewing position than from the reference viewing position (corresponding to the previously mentioned bright window). The resolution of the angular offset metadata may match the angular resolution of the 4D light field image or the resolution of the angular offset metadata may be smaller. The angular range of the angular offset metadata should match the angular range of the 4D light field image such that the rightmost offset metadata pairs with the rightmost angular viewing position. The nine offsets in this example may be:
[0 0 0.1]
[0 0 0.1]
[0 0 0.1]
3) The angular offset metadata is interpolated based on the angular view. This uses the same ufun and vfun calculations previously described to determine the modified angular offset metadata. The diagonal offset metadata may then be averaged to calculate a single offset metadata value to be applied to the entire frame. Alternatively, angular offset metadata may be calculated for different spatial regions of the image and used to change the color volume mapping spatially across the image.
4) The interpolated angular offset metadata is then added to the metadata corresponding to the reference viewing position to determine a final metadata value.
5) The final metadata values are used to apply the color volume mapping.
This example describes a procedure for adjusting only a single metadata field (i.e., offset metadata related to the average brightness of a frame). However, this example may also be applied to other metadata fields.
Exemplary methods and implementation considerations
With simultaneous reference to fig. 6A and 6B, two examples of methods using one or more embodiments described herein will now be provided. The method illustrated in fig. 6A may begin in operation 201. In operation 201, a data processing system may receive a light field image, such as a 4D light field image, having multiple possible views of the image. In one embodiment, the data processing system may receive a sequence of light field images, such as animated content or a movie. In operation 203, the data processing system may also receive optional metadata related to the view in the image; the selectable metadata may include color volume map metadata, angular range metadata, and angular offset metadata. As described above, the angular offset metadata may be used to adjust color volume mapping metadata used to perform color volume mapping depending on the particular view selected. In operation 205, the data processing system may receive a selection of a desired viewpoint. For example, a user may select a particular viewpoint using a user interface provided by the data processing system; alternatively, the data processing system may determine the desired viewpoint based on an estimate of the user's position in the environment surrounding the light field image. Then, in operation 207, the data processing system may determine one or more views at each pixel location using a view function that determines the views as a function of: the desired viewpoint, and pixel location, and the distance between the desired viewpoint and the display. Then, in operation 209, the data processing system may render an image based on the view determined in operation 207. When multiple light field images are received, each light field image may have a desired viewpoint, which in turn is used to determine a view based on a view function. The data processing system may then display the rendered image in operation 211.
In the method illustrated in fig. 6B, in operation 251, the data processing system may receive one or more light field images and associated metadata; the data processing system may also receive a selection of a desired point of view, which may be a selection by the user or a selection performed by the data processing system based on, for example, an estimated location of the user. Then, in operation 253, the data processing system determines whether the desired viewpoint is a valid viewpoint based on the valid viewing range of the specific light field image. For example, the data processing system may determine that the viewpoint is outside of the effective viewing range, in which case the data processing system performs operation 255 to clamp the desired viewpoint to the effective viewpoint using one or more of the embodiments described herein. If operation 253 determines that the viewpoint is valid, or operation 255 creates a valid viewpoint, processing may continue with operation 257. In operation 257, the data processing system may determine a view to render for each pixel location in the image using the angular view functions described herein for the current input of the desired viewpoint (which may be a constraint viewpoint). In some cases, the view determined by the view function may be interpolated from existing views (e.g., neighboring or nearby views at the desired viewpoint). For example, in one embodiment bilinear interpolation may be used to interpolate among the most recent views to get the appropriate view and (therefore) the pixel color value at each pixel location. The data processing system may then adjust the color volume map using the color volume map metadata and the angular offset metadata to adjust the color volume map metadata in operation 261. For example, interpolation may be used to interpolate the angular offset metadata based on the desired viewpoint or determined view to obtain appropriate correction or adjustment to the color volume mapping metadata. Then, in operation 263, the data processing system may render an image based on the determined view and the final metadata, and then display the rendered image.
The angular view functions described herein may be implemented using a three-dimensional (3D) look-up table or using the functional forms described herein. In one embodiment, the arctangent function may be replaced with an approximation (close approximation) as known in the art. The soft compression function may be implemented as a one-dimensional look-up table or using the functional form described herein. For content with a large angular resolution, the amount of data may be very large. In some applications, storing the entire light field image in DRAM memory may be prohibitive. In this case, it may be desirable to store the light field images in an interleaved format, interpolate each angular view, and perform color volume mapping using a few pixels at a time (stored in DRAM memory) instead of storing the entire image in DRAM memory. It may also be desirable to compress light field images, particularly for distribution over a network or a set of networks (e.g., the internet). Such compression may take advantage of the high correlation between neighboring views and may be performed using JPEG, JPEG2000, HEVC, AVC, VVC, etc., or alternatively using MPEG-I.
FIG. 7 illustrates an example of a data processing system 800 that may be used with one or more embodiments described herein. For example, system 800 may be used to perform any of the methods or calculations described herein, such as the methods shown in fig. 6A and 6B. The data processing system may also create a light field image with associated metadata for consumption by the client system. Note that while fig. 7 illustrates various components of a device, it is not intended to represent any particular architecture or manner of interconnecting the components, as such details are not pertinent to the present disclosure. It will also be appreciated that network computers and other data processing systems or other consumer electronic devices (having fewer components or possibly more components) may also be used with embodiments of the present disclosure.
As shown in fig. 7, device 800, which is one form of data processing system, includes a bus 803 coupled to one or more microprocessors 805 and ROM (read only memory) 807 and volatile RAM 809 and nonvolatile memory 811. One or more microprocessors 805 may fetch instructions from memory 807, 809, 811 and execute the instructions to perform the operations described above. The one or more microprocessors 805 may include one or more processing cores. Bus 803 interconnects these various components together and also interconnects these components 805, 807, 809, and 811 to display controller and display device 813 and to peripheral devices such as input/output (I/O) devices 815 (which may be touch screens, mice, keyboards, modems, network interfaces, printers, and other devices known in the art). Typically, input/output devices 815 are coupled to the system through input/output controller 810. The volatile RAM (random access memory) 809 is typically implemented as Dynamic RAM (DRAM) which requires power continually in order to refresh or maintain the data in the memory.
The non-volatile memory 811 is typically a magnetic hard disk drive or a magnetic optical drive or an optical drive or DVD RAM or flash memory or other type of memory system that maintains data (e.g., large amounts of data) even after power is removed from the system. Typically, the non-volatile memory 811 will also be a random access memory, but this is not required. Although FIG. 7 illustrates nonvolatile memory 811 as a local device coupled directly to the rest of the components in the data processing system, it will be appreciated that embodiments of the present disclosure may utilize nonvolatile memory remote from the system (e.g., a network storage device coupled to the data processing system through a network interface (e.g., modem, ethernet interface, or wireless network)). Bus 803 may include one or more buses connected to each other through various bridges, controllers, and/or adapters, as is known in the art.
Portions of the above description may be implemented using logic circuitry, such as dedicated logic circuitry, or using a microcontroller or other form of processing core that executes program code instructions. Thus, the processes taught by the discussion above may be performed with program code such as machine-executable instructions that cause a machine that executes the instructions to perform certain functions. In this context, a "machine" may be a machine (e.g., an abstract execution environment such as a "virtual machine" (e.g., a Java virtual machine), an interpreter, a common language runtime, a high-level language virtual machine, etc.) that converts intermediate form (or "abstract") instructions into processor-specific instructions, and/or electronic circuitry (e.g., a "logic circuit" implemented with transistors) disposed on a semiconductor chip (designed to execute instructions such as a general-purpose processor and/or a special-purpose processor). The processes taught by the discussion above may also be performed by electronic circuitry (as an alternative to, or in combination with, a machine) that is designed to perform the processes (or a portion thereof) without the execution of program code.
The present disclosure also relates to an apparatus for performing the operations described herein. The apparatus may be specially constructed for the required purposes, or it may comprise a general purpose device selectively activated or reconfigured by a computer program stored in the device. Such a computer program may be stored in a non-transitory computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magneto-optical disks, DRAMs (volatile), flash memory, read-only memory (ROM), RAM, EPROM, EEPROM, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a device bus.
A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). For example, non-transitory machine-readable media include read only memory ("ROM"); random access memory ("RAM"); a magnetic disk storage medium; an optical storage medium; a flash memory device; etc.
An article of manufacture may be used to store program code. The article of manufacture storing the program code may be embodied as, but is not limited to, one or more non-transitory memories (e.g., one or more flash memories, random access memories (static, dynamic or other)), optical disks, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other type of machine-readable media suitable for storing electronic instructions. The program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)), and then stored in a non-transitory memory of the client computer (e.g., DRAM or flash memory or both).
The foregoing detailed description has been presented in terms of algorithms and symbolic representations of operations on data bits within a device memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as "receiving," "determining," "sending," "terminating," "waiting," "changing," or the like, refer to the action and processes of a device, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the device's registers and memories into other data similarly represented as physical quantities within the device memories or registers or other such information storage, transmission or display devices.
The processes and displays presented herein are not inherently related to any particular apparatus or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the described operations. The required structure for a variety of these systems will be apparent from the description below. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.
Exemplary embodiments of the invention
The following text presents numbered embodiments in a format similar to the claims and it will be understood that these embodiments may be presented as claims in one or more future applications, such as one or more continuation-in or division-in applications.
Although individual embodiments are described in detail below, it should be understood that these embodiments may be combined or modified, in part or in whole.
Example embodiment 1.
A method for processing data, the method comprising:
receiving image data represented in a light field format, the image data comprising image data of a different view of the image for each of a plurality of pixels in the image;
receiving angular range metadata for the image;
receiving a selection of a desired viewpoint relative to the image; and
determining the one or more views at each of the plurality of pixels using a view function that determines the one or more views, wherein the view function is based on or has an input comprising: the spatial coordinates of each of the plurality of pixels in the image, the desired viewpoint, the received angular range metadata, and (1) the distance between the desired viewpoint and the display or (2) the amount of desired zoom/magnification.
Example embodiment 2.
The method of example embodiment 1, wherein the image is a 4D light field image that has been previously rendered as volumetric content, and wherein a selection of a desired viewpoint is received from a user to see the image at the desired viewpoint, and wherein the view function is an angular view function comprising a horizontal angular view function and a vertical angular view function; the horizontal angle view function has inputs that include: the distance between the desired viewpoint and the display, the horizontal spatial coordinates of the pixels, and the horizontal component of the desired viewpoint; the vertical angle view function has inputs that include: the distance between the desired viewpoint and the display, the vertical spatial coordinates of the pixels, and the vertical component of the desired viewpoint.
Example embodiment 3.
The method of example embodiment 1, wherein the view function is defined with respect to a reference plane at a reference distance from the display such that the view function will determine the same view for all pixels in the image for any one viewpoint in the reference plane.
Example 4.
The method of example embodiment 3, wherein for a view point out of a reference plane, the view function determines different views for different pixels in the image, and wherein the desired view point is selected based on an estimated viewer position or a user-selected position.
Example 5.
The method of example embodiment 1, wherein the method further comprises:
rendering an image based on the determined one or more views; and
the rendered image is displayed in the determined view.
Example 6.
The method of example embodiment 1, wherein the image is a 4D light field image that has been previously rendered as volumetric content stored as tiles in a) decoded planar format, each tile being one of the possible views, or the volumetric content is stored in b) interleaved format.
Example 7.
The method of any one of exemplary embodiments 1 to 6, wherein the method further comprises:
receiving color volume mapping metadata;
a color volume map is applied based on the determined one or more views and color volume map metadata.
Example 8.
The method of example embodiment 7, wherein the color volume mapping metadata is adjusted based on the desired viewpoint and the angular offset metadata specifying one or more adjustments to the color volume mapping metadata based on or as a function of the desired viewpoint.
Example 9.
The method of example embodiment 8, wherein the method further comprises:
the diagonal offset metadata is interpolated based on the desired viewpoint.
Example embodiment 10.
The method of example embodiment 8, wherein the color volume mapping metadata varies on a scene-by-scene basis or on an image-by-image basis across a plurality of different images.
Example embodiment 11.
The method of example embodiment 1, wherein the method further comprises:
the determined one or more views are interpolated from a set of most recently available views in the image data at the desired viewpoint.
Example embodiment 12.
The method of example embodiment 11, wherein the interpolation uses bilinear interpolation from dense light field images with sufficiently high angular density in which differences between adjacent views are imperceptible or nearly imperceptible to a viewer at a reference viewing plane.
Example embodiment 13.
The method of example embodiment 1, wherein the method further comprises:
the viewpoint, which may be a desired viewpoint, is limited to an effective viewing zone, and wherein the effective viewing zone of the image is defined by angular range metadata (of the image) specifying an angular range through which the image can be accurately viewed.
Example embodiment 14.
The method of example embodiment 13, wherein the limit comprises one of: (a) Hard clamping an invalid view point to a view point in the valid viewing zone, or (b) soft clamping an invalid view point to a view point in the valid viewing zone; and wherein the hard clamp always selects a point on the boundary of the active viewing zone and the soft clamp selects a set of points near but not on the boundary of the active viewing zone.
Example embodiment 15.
The method of example embodiment 14, wherein the method further comprises:
metadata is received that includes offset metadata associated with luminance metadata that specifies a statistical (e.g., average or median) luminance value of the image, the offset metadata specifying an adjustment to the luminance metadata as a function of the viewpoint.
Example embodiment 16. A data processing system programmed or configured to perform the method of any one of example embodiments 1 to 15.
Example 17.
A non-transitory machine readable medium storing executable program instructions which, when executed by a data processing system, cause the data processing system to perform the method of any of exemplary embodiments 1 to 15.
Example embodiment 18.
The method of any one of exemplary embodiments 1 to 6, 8 to 14, wherein the method further comprises:
receiving color volume mapping metadata;
a color volume map is applied based on the determined one or more views and color volume map metadata.
Example embodiment 19.
The method of any one of exemplary embodiments 1 to 14, wherein the method further comprises:
metadata is received that includes offset metadata associated with luminance metadata that specifies a statistical luminance value of the image, the offset metadata being used to adjust the luminance metadata based on the viewpoint.
In the foregoing specification, specific exemplary embodiments have been described. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (18)

1. A method for processing data, the method comprising:
receiving image data represented in a light field format, the image data comprising image data of a different view of an image for each of a plurality of pixels in the image;
Receiving angular range metadata of the image, wherein the angular range metadata specifies an angular range through which the image can be accurately viewed;
receiving a selection of a desired viewpoint relative to the image; and
determining one or more views at each of the plurality of pixels using a view function that determines the one or more views, the view function having an input comprising: spatial coordinates of each of the plurality of pixels in the image, the angular range metadata, the desired viewpoint, and a distance between the desired viewpoint and a display.
2. The method of claim 1, wherein the view function is an angular view function comprising a horizontal angular view function and a vertical angular view function; the horizontal angle view function has inputs including: the distance between the desired viewpoint and the display, the horizontal spatial coordinates of the pixels, and the horizontal component of the desired viewpoint; the vertical angle view function has inputs comprising: the distance between the desired viewpoint and the display, the vertical spatial coordinates of the pixels, and the vertical component of the desired viewpoint.
3. The method of claim 1, wherein the view function is defined relative to a reference plane at a reference distance from the display, and wherein the view function is to determine the same view for all pixels in the image for any one viewpoint in the reference plane.
4. A method as claimed in claim 3, wherein for viewpoints outside the reference plane, the view function determines different views for different pixels in the image, and wherein the desired viewpoint is selected based on estimated viewer position or user selection of the desired viewpoint.
5. The method of any of the preceding claims, wherein the method further comprises:
rendering the image based on the determined one or more views; and
the rendered image is displayed in the determined view.
6. The method of any of the preceding claims, wherein the image is a 4D light field image that has been previously rendered as volumetric content, the volumetric content being stored as tiles in a) decoded planar format, each tile being one of the possible views, or the volumetric content being stored in b) interleaved format.
7. The method of any of the preceding claims, wherein the method further comprises:
receiving color volume mapping metadata;
a color volume map is applied based on the determined one or more views and the color volume map metadata.
8. The method of claim 7, wherein the color volume mapping metadata is adjusted based on the desired viewpoint and angular offset metadata specifying one or more adjustments to the color volume mapping metadata based on or as a function of the desired viewpoint.
9. The method of claim 8, wherein the method further comprises:
the angular offset metadata is interpolated based on the desired viewpoint.
10. The method of any of claims 7 to 9, wherein the color volume mapping metadata varies on a scene-by-scene basis or on an image-by-image basis across a plurality of different images.
11. The method of any of the preceding claims, wherein the method further comprises:
the determined one or more views are interpolated from a set of most recently available views in the image data at the desired viewpoint.
12. The method of claim 11, wherein the interpolation uses bilinear interpolation from dense light field images with sufficient angular density.
13. The method of any of the preceding claims, wherein the method further comprises:
the viewpoint is limited to an effective viewing zone, and wherein the effective viewing zone of the image is defined by the angular range metadata.
14. The method of claim 13, wherein the restriction comprises one of: (a) Hard clamping an inactive viewpoint to a viewpoint in the active viewing zone, or (b) soft clamping the inactive viewpoint to a viewpoint in the active viewing zone, wherein hard clamping always selects a point on a boundary of the active viewing zone, and soft clamping selects a set of points near but not on the boundary of the active viewing zone.
15. The method of any one of claims 1 to 14, further comprising:
metadata is received that includes offset metadata associated with luminance metadata that specifies a statistical luminance value of the image, the offset metadata specifying an adjustment to the luminance metadata as a function of the viewpoint.
16. The method of any of claims 1 to 15, wherein the image is a 4D light field image that has been previously rendered as volumetric content, and wherein the selection of the desired viewpoint is received from a user to see the image at the desired viewpoint.
17. A data processing system programmed or configured to perform the method of any one of claims 1 to 16.
18. A non-transitory machine readable medium storing executable program instructions which, when executed by a data processing system, cause the data processing system to perform the method of any of claims 1 to 15.
CN202180088332.8A 2020-12-04 2021-12-02 Processing of extended dimension light field images Pending CN116648903A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202063121372P 2020-12-04 2020-12-04
EP20211870.9 2020-12-04
US63/121,372 2020-12-04
PCT/US2021/061683 WO2022120104A2 (en) 2020-12-04 2021-12-02 Processing of extended dimension light field images

Publications (1)

Publication Number Publication Date
CN116648903A true CN116648903A (en) 2023-08-25

Family

ID=87623439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180088332.8A Pending CN116648903A (en) 2020-12-04 2021-12-02 Processing of extended dimension light field images

Country Status (1)

Country Link
CN (1) CN116648903A (en)

Similar Documents

Publication Publication Date Title
US20210099706A1 (en) Processing of motion information in multidimensional signals through motion zones and auxiliary information through auxiliary zones
CN110324664B (en) Video frame supplementing method based on neural network and training method of model thereof
US9843776B2 (en) Multi-perspective stereoscopy from light fields
US7689031B2 (en) Video filtering for stereo images
US6975329B2 (en) Depth-of-field effects using texture lookup
US9041819B2 (en) Method for stabilizing a digital video
US9165401B1 (en) Multi-perspective stereoscopy from light fields
WO2019159617A1 (en) Image processing device, image processing method, and program
US8976180B2 (en) Method, medium and system rendering 3-D graphics data having an object to which a motion blur effect is to be applied
WO2013074561A1 (en) Modifying the viewpoint of a digital image
US20130162625A1 (en) Displayed Image Improvement
KR20200052846A (en) Data processing systems
US20130129192A1 (en) Range map determination for a video frame
US20130129193A1 (en) Forming a steroscopic image using range map
WO2005006181A2 (en) Method of and scaling unit for scaling a three-dimensional model
US7616220B2 (en) Spatio-temporal generation of motion blur
CN110999285A (en) Processing of 3D image information based on texture maps and meshes
Li et al. A real-time high-quality complete system for depth image-based rendering on FPGA
CN110140151A (en) Device and method for generating light intensity image
Turban et al. Extrafoveal video extension for an immersive viewing experience
US10699383B2 (en) Computational blur for varifocal displays
KR20190011212A (en) Method of and data processing system for providing an output surface
JP2009237927A (en) Image composition method and device
JP2004356789A (en) Stereoscopic video image display apparatus and program
US20240031543A1 (en) Processing of extended dimension light field images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination