CN117099368A

CN117099368A - System and method for processing volumetric images

Info

Publication number: CN117099368A
Application number: CN202280027069.6A
Authority: CN
Inventors: R·阿特金斯
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 2021-05-06
Filing date: 2022-05-05
Publication date: 2023-11-21

Abstract

In one embodiment, a volumetric image of a scene may be created by: recording a series of images of a scene by a camera in the device as the camera moves along a path relative to the scene; during recording, the device stores motion path metadata about the path, the series of images being associated with the motion path metadata, and a metadata tag being associated with the series of images, the metadata tag indicating that the recorded series of images represents a volumetric image of the scene. The series of images, motion path metadata, and metadata tags may be assembled into a package for distribution to devices that may view volumetric images, which may be referred to as limited volumetric images. The device receiving the volumetric image may display individual images of the series of images or as video.

Description

System and method for processing volumetric images

Cross Reference to Related Applications

The present application claims priority from U.S. provisional patent application No. 63/185,082 filed 5/6 of 2021 and european patent application No. 21172493.5 filed 5/6 of 2021, each of which is hereby incorporated by reference in its entirety.

Background

The present disclosure relates to methods and systems for recording and displaying images. In particular, the present disclosure relates to methods and systems for recording and displaying volumetric images.

Traditional photographs capture 2D (two-dimensional) projections of a scene observed from a single viewpoint, i.e., the position of the camera's aperture at the time the photograph was taken. Traditional graphics, drawings, and computer-generated rendered drawings also have only a single viewpoint.

However, in real life, the human visual system is exposed to very rich depth cues that result from viewing a scene from different perspectives. Such a vision system is able to compare the appearance of scenes seen from different viewpoints to infer the 3D (three-dimensional) geometry of the world around the observer.

This difference between the manner in which the real world is perceived and the manner in which the real world is captured today using traditional photography techniques significantly limits how well others capture, share and view the look and feel of a scene. Without depth information obtained from multi-view viewing, the true sense or understanding of the scene is lost and cannot be recovered.

Efforts have been made to make up for this gap. 3D binocular imaging captures a scene from two viewpoints corresponding to two eye positions. This can greatly increase the sense of depth and realism. However, it is limited because the image remains the same as the observer moves around. This difference between the appearance that an image "should" appear and the appearance that it "does" appear limits how well a real representation of a rendered scene can be achieved.

Recent efforts have involved volumetric or light field capturing to create volumetric images. This involves an array of many cameras capturing a scene from many viewpoints simultaneously. This then allows the "real scene" to be reproduced correctly from a wide variety of viewpoints. A disadvantage of this approach is that in many cases the mounting of the camera array required to capture the scene is not allowed. It also requires careful calibration of the camera array to align the cameras, calibration of color sensitivity and lens, and synchronization of capture times. Furthermore, in addition to the complex rendering at playback, the amount of data created with this process requires complex image processing and compression for transmission.

A simple and user-friendly method of capturing a volumetric representation (e.g., volumetric image) of a scene is desirable, but the techniques known in the art do not provide.

Disclosure of Invention

In one embodiment, a volumetric image of a scene may be created by: recording a series of images of a scene by a camera in the device as the camera moves along a path relative to the scene; during recording, the device may store motion path metadata about the path, the series of images being associated with the motion path metadata, and a metadata tag being associated with the series of images, the metadata tag indicating that the recorded series of images represents a volumetric image of the scene. The series of images, motion path metadata, and metadata tags may be assembled into a package for distribution to devices that can view volumetric images, which may be referred to as limited volumetric images, having a set of images of the scene from different camera locations due to the movement of the camera during recording. The device receiving the volumetric image may display individual images (or record as video playback) of the series of images at the desired viewpoint. The volumetric image may be referred to as a limited volumetric image. In one embodiment, the recording may be by a camera set to be in a video capture mode (e.g., a movie mode), and the recording includes capturing and storing images continuously over a period of time, and the capturing is performed at a predetermined frame rate (e.g., 30 or 60 frames per second) for displaying video. In another embodiment, the frame rate may not be predetermined, but the time at which the frames are captured is based on the movement of the camera along the path; this may mean that the rate at which images are captured varies with the speed of movement of the camera along the path of motion.

The images may be associated with the motion path metadata to associate and/or link each image to a location along the motion path (e.g., to associate and/or link each respective image to a respective location), such as a location along the motion path at which the image was captured. In an embodiment, the motion path may be captured or recorded with the image (e.g., as a synchronization) such as by a camera or a device configured to determine the motion path. In an embodiment, the series of images may be captured by capturing and storing images continuously over a period of time. Capturing may be performed at a predetermined frame rate for displaying video or at a rate based on camera movement along a path. The volumetric image may comprise a set of images of the scene from different camera positions or viewpoints, and wherein the series of images, the associated first motion path metadata and the metadata tag may be assembled into a package. The series of images may be compressed in the package.

In one embodiment, after the series of images is recorded, the series of images may be adapted (formed) to a desired motion path, and the motion path metadata may be updated based on the adapted series of images. For example, if the desired motion path is a horizontal line, then a vertical deviation in the actual motion path (as indicated by the motion path metadata) may be corrected to or otherwise accommodate the desired horizontal line by cropping the vertically deviated image and interpolating portions of the cropped image; in one embodiment, this may also mean that some images are completely cropped, resulting in a large displacement from one image to the next, and that the large displacement should be updated in the motion path metadata. In one embodiment, after the series of images is recorded, the position of one or more images may be adjusted to smooth out the displacement between the images in the series of images along the desired path of motion. For example, if some images are eliminated due to vertical cropping, the positions of images near the eliminated images are adjusted to smooth out the displacement between images along the motion path (e.g., eliminate large jumps between adjacent images). The adjustment of the position means that the motion path metadata should be updated to account for the adjustment of the position. In one embodiment, the motion path metadata may indicate the actual physical displacement of the camera from one image to the next during recording (as the camera moves along the path). The motion path metadata may be used at playback to select a desired viewpoint on the scene for display; for example, if a viewer desires to see a point of view at the middle of the path, the motion path metadata is used to find the middle of the path and the image recorded closest to that point along the path.

In one embodiment, during recording, the camera (or a device containing the camera, such as a smartphone) may display the guide on a display device (e.g., an LCD or OLED display); the wizard may show the user the way to move the camera or device over a period of time to produce a good recording (e.g., in both direction and speed). Also in one embodiment, the camera or device may store distance metadata that provides an estimate of the distance between one or more objects in the scene and the single camera; the distance metadata may be used in interpolating to the viewpoint. In one embodiment, the camera or device may also store dynamic range metadata indicating a dynamic range for each image in a set of images in the series of images, wherein the dynamic range for each image indicates a luminance range for each image (e.g., a maximum luminance value in the image, an average luminance value in the image, and a minimum luminance value in the image). The dynamic range metadata may be used on the playback device to adjust the luminance value of the image data based on the luminance range of the particular image and the luminance capabilities of the display device of the playback device, and this may be accomplished using techniques for color volume mapping that are known in the art.

Another aspect of the present disclosure is a playback device and method for playing back volumetric images. Embodiments of such a method may include the following operations: receiving a series of images and associated motion path metadata and a volume metadata tag indicating that the series of images represent a volume image of a scene; determining a desired viewpoint of the volumetric image; determining a selected image from a desired viewpoint based on the series of images; the selected image is displayed. The determination of the selected image may be based on a comparison of the desired viewpoint to the motion path metadata. The motion path metadata may indicate a displacement from one image to the next image in the series of images along a path used during recording of the series of images. The playback device may receive a package containing the series of images (e.g., in compressed format) along with the motion path metadata and the volume metadata tag assembled together in the package. In one embodiment, recording supports at least two modes of rendering content in a volumetric image at the playback device: (1) Displaying a single image at a desired viewpoint, and (2) displaying the series of images as a movie.

In an embodiment of the method and/or playback device, determining the selected image is based on a comparison of the desired viewpoint and the motion path metadata, and wherein the series of images is recorded in a single camera during successive capturing and storing of images over a period of time along the motion path of the single camera, and the capturing is performed at a predetermined frame rate for displaying video or at a rate based on movement of the camera along the path.

In one embodiment, the desired viewpoint may be determined at the playback device according to one of: (1) manual user selection from a user interface, or (2) sensor-based tracking of a user's face or head, or (3) a predetermined set of one or more viewpoints provided by a content creator, and which may include an ordered sequence of images to be displayed. In one embodiment, the desired point of view is automatically determined based on the position of the viewer's head as detected by the sensor based on tracking by the sensor. The sensor may be a camera or a set of sensors, such as a conventional 2D camera and a time-of-flight camera or LIDAR (light detection and ranging). In one embodiment, the playback device may adapt the selected image by scaling the selected image or vertically shifting the image with an affine transformation. In one embodiment, a playback device may receive dynamic range metadata indicating a dynamic range for each image in a set of images in the series of images, the dynamic range for each image indicating a luminance range; and mapping the selected image to the dynamic range capabilities of the target display based on the dynamic range metadata of the selected image (using known color volume mapping techniques). In one embodiment, the selected image is interpolated by the playback device from a set of images in the series of images that represent a match between the desired viewpoint and the motion path metadata.

Aspects and embodiments described herein may include a non-transitory machine readable medium that may store executable computer program instructions that, when executed, cause one or more data processing systems to perform the methods described herein when the computer program instructions are executed. The instructions may be stored in a non-transitory machine-readable medium, such as Dynamic Random Access Memory (DRAM), which is volatile memory, or non-volatile memory such as flash memory, or other forms of memory. Aspects and embodiments described herein may also take the form of a data processing system constructed or programmed to perform these methods. For example, a data processing system may be established with hardware logic for performing the methods, or may be programmed with a computer program for performing the methods, and such a data processing system may be referred to as an imaging system. The data processing system may be any of the following: a smart phone including a camera, a tablet computer including a camera; laptop computers with cameras, traditional cameras with additional hardware or software to capture motion path metadata and perform other tasks described herein, and other devices.

The above summary does not include an exhaustive list of all embodiments and aspects of the present disclosure. All systems, media, and methods can be practiced according to all suitable combinations of the aspects and embodiments summarized above and those disclosed in the detailed description below.

Drawings

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.

Fig. 1 shows an example of a method of recording a scene by moving a camera horizontally across the scene (a set of trees) while recording images of the scene in video mode.

Fig. 2A illustrates an example of a method of recording a scene by moving a camera in a circular motion across the scene (a set of trees) while recording an image of the scene in video mode.

Fig. 2B illustrates an example of a method of recording a scene by moving a camera in a random path across the scene (a set of trees) while recording an image of the scene in video mode.

Fig. 2C illustrates an example of a method of recording a scene by moving a camera in a serpentine path (serpentine path) across the scene (a set of trees not shown) while recording images of the scene in video mode.

FIG. 3 illustrates an example of a user interface displaying a guide that assists a user in recording a scene by providing cues for one or more of direction of movement and speed of movement.

Fig. 4A shows the expected path of motion for recording a series of images.

FIG. 4B illustrates an example of an actual motion path taken during recording of a scene to create a volumetric image; the actual motion path exhibits deviations from the intended motion path shown in fig. 4A.

Fig. 5 shows an example of a measurement of the distance between an object in a scene being recorded and a camera; in another embodiment, the distance may be recorded for each recorded image.

Fig. 6A shows another example of an actual motion path (for an expected path as a horizontal line); the figure also shows possible interpolation of missing parts of the image (video frame).

Fig. 6B shows an example of how the images in the series shown in fig. 6A can be adapted to the intended horizontal path by cropping the images to remove deviations from the horizontal path.

Fig. 6C shows an example of how the images in the series shown in fig. 6B may be further adapted to smooth the displacement between the images (e.g., by equalizing the displacement distance between frames).

Fig. 7A shows an example of a method of recording a volume image.

Fig. 7B is a flowchart showing another example of a method of recording a volumetric image.

Fig. 8A illustrates an example of a method for displaying or playing back one or more of the volumetric images.

Fig. 8B illustrates another example of a method for displaying or playing back one or more of the volumetric images.

Fig. 8C shows an example of a playback device (e.g., a smartphone) that includes a display device and a front-facing camera over the display device.

Fig. 8D illustrates an example of a User Interface (UI) that allows a user to select a particular viewpoint on a volumetric image.

FIG. 9 is a block diagram illustrating an example of a data processing system that may be used to implement one or more embodiments described herein.

Detailed Description

Various embodiments and aspects will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative and should not be construed as limiting. Numerous specific details are described to provide a thorough understanding of the various embodiments. However, in some instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment. In this specification, the appearances of the phrase "in one embodiment (in one embodiment)" in various places are not necessarily all referring to the same embodiment. The processes depicted in the figures below are performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software, or a combination of both. Although the processes are described below in terms of some sequential operations, it should be appreciated that some of the operations described may be performed in a different order. Further, some operations may be performed in parallel rather than sequentially.

Recording of volumetric images

Embodiments may begin with a user moving a camera and recording video of a scene while moving the camera; the term video is intended to include the image capturing process, i.e., capturing and storing images continuously at a predetermined frame rate over a period of time. The predetermined frame rate may be a standard video or movie frame rate, such as 30 frames per second (i.e., 30 frames of images are captured and stored per second while recording a scene); as is known in the art, the frame rate may even be higher or lower than 30 frames per second. Higher frame rates will increase the chance of being able to perform some of the interpolations described herein. The (recorded) time period may be as short as a few seconds or as long as a few minutes when the user moves the camera. A recording device including a camera may specify a motion path, and the specified motion path may be selected by a user before starting a recording process. The movement path may be any of the following: a horizontal line path, a vertical path, a circular path, a square path, a random path, or a serpentine path. The simplest horizontal motion path is described in detail in this disclosure, but embodiments may be extended to other motion types as well. In another embodiment, the frame rate may not be predetermined, but the time at which the frames are captured is based on the movement of the camera along the path; this may mean that the rate at which images are captured varies with the speed of movement of the camera along the path of motion. For example, a device containing a camera may monitor movement along a motion path as described below and capture frames or images based on the movement. When the displacement from the position of the last captured frame reaches a desired value (e.g., the camera has moved by a displacement of 1mm since the last image was captured), the camera captures the next frame, and this may be repeated as the camera moves along the path of motion. Thus, when a frame is captured, movement of the camera is triggered (e.g., capturing one frame for every 1mm displacement along the path) such that the position of the camera along the path controls the time at which the frame is captured. Doing so may minimize the number of captured frames (but still be sufficient to provide good coverage of the scene when the displacement values between frames are small) and may also simultaneously adapt the images to the motion path (at least adapting the spacing of the captured images along the path, as described below in connection with fig. 6C). This adaptation may be done during acquisition, rather than as a post-processing operation as described below.

The camera may be a digital camera, such as an optical camera and/or a video camera. The camera may be configured to capture light in the visible spectrum (i.e., the spectrum visible to the human eye).

Some examples of different motion paths will now be described. Fig. 1 shows an example of recording a scene 10A (a set of trees) by capturing a series of images 12 while moving the camera along a horizontal path of motion 14. While recording is occurring, the device containing the camera may record the position of the camera relative to the scene by recording the displacement or movement of the camera in at least the X-direction and the Y-direction (with the horizontal direction on the X-axis and the vertical direction on the Y-axis); the displacement recording may be performed for each image in a series of images, and the recording will be described further below. Fig. 2A shows an example of recording a scene 10B (another set of trees) by capturing a series of images 16 while moving the camera in a circular path 18. Fig. 2B shows an example of recording a scene 10C (another set of trees) by capturing a series of images 20 along a random motion path 22 (moving in both the horizontal and vertical directions). Fig. 2C shows an example of recording a scene (not shown) by capturing a series of images 25 in a serpentine motion path 23. For each of these recordings, the device containing the camera may record the position of the camera at each of the series of images (or at least a subset of those images), and the recording may begin at an initial assumed position of 0,0 for the first image (e.g., x=0, y=0), and each image thereafter is associated with a displacement of the camera from its previous image to the current image in the series (e.g., Δx=0.25 mm, Δy=0.0 mm). For example, the position of the first image is x=0, y=0, and then the position of the next (2 nd) image is x=0.25, y=0.0 (if the displacement from the first image to the next image is Δx=0.25 mm, Δy=0.0 mm). These recorded displacements may be referred to as motion path metadata that describes the position of an image in a series of images along a motion path such that an image may be later selected based on the position of the image on the motion path (e.g., the first image along the path, or the last image along the path, or an image along about 1/3 way along the path, etc.). Further information about the motion path metadata is provided below.

During recording of a scene, the camera or the device containing the camera may display a guide to the user to show the user how to move the camera or the device while recording the scene. FIG. 3 shows an example of a wizard 53 that may be displayed on a display device 51 of a camera (or a device containing a camera); the display device 51 may be, for example, a front display of a smart phone. Wizard 53 may be displayed as a semi-transparent overlay over an image displayed shortly after being captured. The wizard may show a representation of the scene captured by the camera and a shadow area representing which parts of the scene have been captured (and which parts are still to be captured), as well as an arrow indicating the direction of the next movement of the camera. The additional text may instruct the user to increase or decrease the speed of movement or to better follow the desired capture path. The guide may be displayed in many different forms, such as a line with an arrow indicating the desired direction of movement, and the size of the line may increase over time, with the rate of increase indicating the desired speed of movement. Another example of a user interface for a wizard may use a bar of fixed size, wherein a pointer on the bar moves along the bar in a desired direction (at a desired speed). It is expected that the user will tend to move faster at the beginning of the motion path and slower near the end of the motion path, and such a wizard may help the user control both the direction of the actual path and the speed of the actual path. The embodiments described below may be used to eliminate deviations in both direction and speed using post-processing operations that may be performed on a recording device or a playback device.

In an embodiment, instructions may be provided to the camera to display the guide during the capture of a series of images. The instructions may be provided by the data processing system to the camera, such as a processing unit and/or a display unit of the camera, possibly during receipt of a series of images from the camera at the data processing system. The instructions may be generated at the data processing system and/or may cause the camera to display the guide.

In one embodiment, the motion path metadata provides an estimate of the actual motion path of the camera, such as the displacement of each video frame or image from the last video frame or image. Fig. 4A and 4B show the difference between the expected motion path 75 (in fig. 4A) and the actual motion path 90 (in fig. 4B) taken during recording. The intended motion path 75 includes a series of images 76-87 that are uniformly spaced along the intended motion path 75. Thus, the direction of the movement path 75 is expected to be perfectly horizontal, with no deviation, and the movement speed along the path is uniform (no change in speed during the recording period). In the actual motion path 90 (fig. 4B), the motion (or capture) path is not only horizontal, but also contains vertical offset or deviation; the actual motion path 90 includes images 89, 91-99. In addition, the amount of horizontal displacement of the camera between each frame or image is not uniform; there are several irregular jumps between images (e.g., from image 89 to the next image) and also several overlapping images, so the speed of motion varies over the recording period. The recorded motion path metadata shown in fig. 4B will reveal deviations outside the path (e.g., vertical deviations) and displacement deviations due to changing motion speeds during recording.

Image processing may be used in the device to estimate the actual motion path to estimate motion from the captured images, or other sensors such as accelerometers and gyroscopes may be used by the device to determine displacement between images along the path. In one embodiment, the relative position of the user's face and the preview window is used to estimate the actual motion path. The motion path may be stored as displacement path metadata associated with the limited volume image. In one embodiment, this metadata may be expressed as a 2D (horizontal and vertical) displacement [ deltax, deltay ] between each frame [ n ], in millimeters (mm), with an accuracy of up to 1/4mm, and as a 12-bit unsigned value:

MotionPath[n]＝CLAMP(0,4095,floor([deltax,deltay]*4+0.5)+2048))

wherein c=clip (a, b, x) represents

If x is less than or equal to a, c=a, if x is less than or equal to b, c=b, otherwise c=x.

This example works for displacements of + -0.25mm to + -512 mm.

In another embodiment, the motion path metadata may be more simply expressed as the total displacement between the first frame and the last frame (e.g., the total distance traveled by the camera during recording X mm), which would be divided by the number of frames to obtain the displacement offset for each frame. If a sensor is used to capture the motion path metadata, the recording device will typically generate the motion path metadata, but if image processing is used to derive the motion path metadata, the playback device may generate the motion path metadata after receiving a series of images (and also perform the optional image processing described herein). As described further below, the motion path metadata allows the playback device to select an image in the volumetric image based on the position of the image on the motion path (e.g., the first image along the path, or the last image along the path, or an image along about 1/3 way along the path, etc.); for example, if the desired viewpoint is one third way along the path and the path is 24mm long (from start point to end point), the playback device uses the motion path metadata to find the closest image 8mm from the start point of the motion path.

Another set of metadata that may be created and used is scene distance metadata. Scene distance metadata provides an estimate of the distance of a scene from a camera. Fig. 5 illustrates metadata describing a distance 101 of a scene to a capture device that is recording a series of images, including images 103 and 109 along a motion path 111. In one embodiment, the distance may be a single value representing the distance from the camera to a single object of interest. The object of interest is typically a foreground object, such as a person, or in the example shown in fig. 5 a car. Other embodiments may store a depth map of the entire scene for a reference location (center location of the motion path) or for each captured location of each image. The distance of the object of interest may be estimated for a single point by using the range finding function of the camera (typically for auto-focusing). Alternatively, depth may be estimated by comparing images captured from multiple locations with information about displacement, or from images captured from two or more cameras with known displacement. Such techniques are known in the art and have been used to extract depth maps from binocular 3D image pairs. In one embodiment, a SLAM (simultaneous localization and mapping) algorithm may be used to determine the position of a camera within a scene. In one embodiment, the metadata is represented as the inverse of the distance in meters represented using a 12-bit unsigned value:

SceneDistance＝CLAMP(0,4095,floor(1/d*4096+0.5))，

Wherein floor (x) provides a maximum integer less than or equal to x.

This example of metadata is valid between a distance just over 1m (4096/4095) to infinity (the last distance before infinity is 8192 m). This distance information may be used when performing interpolation to create an image at a desired viewpoint in case a plurality of images are selected as possible candidates for the desired viewpoint at the playback device.

Another set of metadata that may be created and used is dynamic range metadata. The dynamic range metadata may describe a luminance range (e.g., a minimum luminance value, an average luminance value, and a maximum luminance value) of content from each viewpoint (e.g., each image in a series of images in a volumetric image). Additional metadata describing the luminance range of the scene from that view may be collected or calculated for each frame or image in the record. Thus, when the camera moves from a very dark portion to a very bright portion of the scene, the dynamic range metadata may reflect statistics of the content (e.g., minimum, average, and maximum luminance values for each image in a series of images) to guide downstream color volume mapping algorithms, such as those used in Dolby Vision (Dolby Vision) HDR processing. To ensure temporal stability when the user switches viewpoints, temporal filtering may be applied, as discussed in U.S. provisional patent application No. 63/066,663, "Picture metadata for high dynamic range video [ picture metadata for high dynamic range video ]" filed by r.atkins on day 8, month 17 of 2020. The metadata may be in addition to existing dolby view metadata that is common to all frames in the sequence. In one embodiment, the dynamic range metadata may be represented as per-frame [ n ] offset (minimum, middle, and maximum luminance offsets represented by PQ) of dolby view L3 metadata:

ViewPointOffsets[n]＝CLAMP(0,4095,floor([PQOffsetMin/Mid/Max]*4096+0.5)+2048)

The metadata may be valid within a-0.5 to 0.5 offset range of PQ brightness. Upon playback, the metadata may be used to adjust image data values at pixels in the image based on the dynamic range of the particular image at the desired viewpoint and the luminance range capabilities of the particular display device on the playback device.

As described below, once a scene is captured, optional image processing may be applied to improve the captured video. The goal of this processing may be to simplify playback behavior because by applying the processing once on a system other than the playback device during or after image capture, the computation required on the playback device may be minimized to enable a wider range of playback devices. If image processing is not applied at the time of image capturing but at the time of image playback, higher image fidelity may be obtained in some cases (because the full quality captured signal may be stored and transmitted), but the playback device may need to perform similar functions as described below. Applying the processing described below at or after capture may also improve compression efficiency because there may be a higher correlation between adjacent frames along the camera motion path.

An alternative image processing method may attempt to adapt or align the image with an intended motion path (which may be selected by the user before or after a recording session of the scene); such adaptation or alignment may adjust the image to correct for deviations outside the intended path of motion. The system in one embodiment may require the user to confirm or select the desired path (e.g., horizontal or circular or vertical, etc.), and then perform the operations described below to adapt or align the image or frame with the desired motion path. For example, if only a horizontal motion path is desired, any deviation in the vertical direction may be removed by cropping individual video frames (as is typically done when capturing panoramic photographs). For example, FIG. 6A shows the actual motion path 90 (including images 89, 91-93, and 98) for the horizontal motion path, and FIG. 6B shows the corrected path after vertical alignment and cropping is applied (the path includes cropped images, including images 125-129 and 135). Clipping may preserve those portions of each image that are common to all images while clipping the rest of each image. This can be seen in fig. 6B, which shows how only a common portion of the trees in images 93 and 89 remain in cropped version images 128 and 129 in fig. 6B. Instead of cropping portions of the image, missing portions of the image may preferably be interpolated based on neighboring frames. This may be done using techniques known in the art for interpolating additional frames for frame rate interpolation, but in this case for drawing in larger missing areas of the image. For example, as illustrated, the top region (dashed line 115) of the image 89 frame in fig. 6A may have missing information that may be interpolated from nearby frames. When such image processing has been applied, the capture path metadata should be updated accordingly to reflect the new corrected displacement of the image (e.g., cropping may eliminate the entire image and require adjustment of the displacement values along the path to compensate for the deleted image). Although only vertical correction is illustrated, the technique may also be applied to other capture paths, such as circular paths. Instead of simply cropping the images, they may be "shifted" to align with the intended path of motion. This is very similar to the next section where the image is adapted along the motion path. The shift may be as simple as a translation, but may be more complex for a device that can support it, and includes interpolation and repair using information present in neighboring images. This is more preferable than clipping, but requires much more computation.

Another alternative image processing method may attempt to adapt or align the images to correct for deviations along the intended path of motion (which may be selected by the user before or after the scene recording period); such adaptation or alignment may adjust the image to correct for variations in the speed of movement along the intended path of movement during recording. For example, such alternative image processing may align video frames to equalize or smooth out the displacement between each image along the motion path. For example, in a horizontal motion path, the movement speed may be faster at the beginning and then slower, resulting in a larger displacement between images at the beginning and smaller displacement at the end, as illustrated in fig. 6B. Video processing may be applied to equalize the displacement by: the additional images are interpolated at the beginning or some are removed at the end, making the displacement between frames uniform, as shown in fig. 6C. When such image processing has been applied, the motion path metadata should be updated accordingly to reflect the new corrected displacement values between the images.

An alternative image processing method may attempt to remove object motion in the scene (e.g., a person running through the scene). Some moving objects in a scene (such as people, vehicles, even waves and clouds) may interfere with the intended experience of watching the same scene from different viewpoints when playing back. To improve the experience, image processing techniques known in the art and commonly used to capture panoramic images may be used to remove movement of objects in the scene. This is sometimes referred to in the art as "ghost removal". See M.Uyttendaele et al, "Eliminating ghosting and exposure artifacts in image mosaics [ eliminating ghosts and exposure artifacts in image mosaics ]", proceedings of the 2001IEEE Computer Society Conference on Computer Vision and Pattern Recognition [ institute of IEEE computer science computer vision and pattern recognition in 2001 ] CVPR 2001, volume 2 IEEE,2001.

Distribution of volumetric images

Once the limited volume image is captured and processed, it is ready for distribution. This may be accomplished by compressing the frames in order from the first image captured along the camera motion path during recording to the last image along the motion path at the end of recording using standard video coding techniques. The correlation between images can be very high, which can be exploited by video coding engines to produce very efficient compressed representations of limited volume images.

It will be apparent to those skilled in the art that other encoding techniques are possible, including current efforts in multi-image encoding or encoding for augmented or virtual reality content. However, these techniques are not requirements of embodiments in this disclosure, which describes techniques for delivering limited volume content even to devices using conventional video decoders. This allows for low power low complexity decoding operations, as such decoders are highly optimized for power consumption and performance.

In some embodiments, the GOP structure of the encoding process (group of pictures, referring to the choice of I, B, and P frames) may be symmetrical so that the decoding speed may be optimized for both forward and reverse directions:

1) In particular embodiments, encoding may use only I frames without any inter prediction at all.

2) In another embodiment, the encoded limited volume image contains two I frames as the first and last frames of the sequence, wherein all other frames are P frames predicted from only the first and last frames.

3) In another particular embodiment, the sequence of frames are encoded in forward and reverse order and concatenated together to further facilitate smooth playback in both forward and reverse directions for a decoder optimized for play in only a single direction.

The encoded video content and the metadata described in the previous section may be multiplexed or assembled together to form a single packet containing a limited volume image. Additional metadata may be added to:

1) This type of video is marked as limited volume video (to distinguish it from other types of video so that the playback device can handle it correctly), for example using dolby view L11 "content type" metadata, as proposed by r.atkins and p.j.a.klittmark in WIPO WO 2020/265409"Video content type metadatafor high dynamic range [ video content type metadata for high dynamic range ].

2) A representative frame is included as a first frame of the thumbnail image or video to facilitate navigation of the thumbnail view and image, which may not correspond to the first frame of the capture path.

Playback of

A playback device receiving a volumetric image may detect or identify it as a volumetric image based on the presence of a volumetric tag or label. This identification may cause the device to pass the volumetric image to the video decoder in the event of a pause in playback (such that the recorded movie will not automatically play). The playback device may then determine a desired viewpoint of the volumetric image and then select or generate an image at the desired viewpoint.

The determination of the desired point of view by the playback device may depend on the capabilities and configuration of the playback device. In a so-called preview mode, the playback device does not receive user interaction to select a desired viewpoint; instead, the playback device automatically advances the viewpoint in a continuous and cyclical configuration. The content creator (of the volumetric image) may select a particular sequence of viewpoints (which may be different from the natural sequence through the motion path used during recording) and may use this sequence instead of the natural sequence from the start to the end of the motion path. The user is able to see the image from a plurality of different perspectives, but does not attempt to control the desired perspective. In one embodiment, user control may not be provided in the preview mode, and in another embodiment, the user may stop or exit the preview mode and invoke the user interface element to select a desired viewpoint or invoke the tracking mode by selecting the user interface element that invokes the tracking mode. In a so-called manual mode, the playback device displays one or more user interfaces that allow a user to select a desired viewpoint. In manual mode, the user interacts with a control to select a desired viewpoint. In the form of a finger dragging over the displayed image forming the volumetric image or a mouse or finger controlling a slider. The user takes some form of control through one or more user interface elements on top of the perspective view, but this requires manual intentional interaction by the user. In the so-called tracking mode, one or a set of sensors on the playback device is used to estimate the position of the user relative to the screen on which the image from the volumetric image is displayed. The sensor may be, for example, a front-facing camera (e.g., sensor 355 in fig. 8C) that captures an image of the user's head to determine the user's position or gaze point on the screen. A suitable camera may be a structured light or time of flight camera, but a conventional 2D camera may also be used. The position of the face is extracted from the camera view using known techniques. As the user moves (in X, Y and Z directions), the view angle is updated based on the user's location. With sufficient metadata and calibration, this mode can provide high-fidelity rendering of the scene by matching the user's viewpoint with the correct viewpoint captured in the actual scene.

The selection or generation of images by the playback device at the desired viewpoint may also depend on the capabilities and configuration of the playback device, and also on the range of motion paths of the content. In one embodiment, the playback device may perform the following operations.

a. First, the closest image(s) along the motion path are found using the desired viewpoint (which is at a desired position in what may be referred to as video). This may be done by comparing the desired viewpoint to the motion path metadata to find the closest match(s). In one embodiment, the image with the closest match between the motion path and the desired viewpoint is selected. For example, if the desired viewpoint is at the middle distance along the motion path (in practice, in the middle of the volumetric image), and the motion path is 24mm long, then the image closest to 12mm from the start of the motion path is selected. As described herein, the motion path metadata may include the location of each image; when the displacement is stored in metadata, the position may be calculated from the sum of all previous displacements of previous images/frames. In another embodiment, the nearest N frames are selected, where N is between 2 and 6, and used for interpolation to create an image at a closer position. The motion path metadata may be preprocessed to create a 1D or 2D array of entries that is stored in memory with the corresponding desired view and frame index(s) so that the processing need only be done once.

b. Next, the selected image(s) or frame(s) are decoded by a video decoder in the playback device. This may also be done in a preprocessing stage and the results stored in memory to allow faster and smoother access to the decoded image, depending on the capabilities and desired performance of the playback device.

c. The decoded frame(s) may then be displayed or interpolated into a new view. If only a single frame is selected in step (a), the image may be displayed directly. If a plurality of frames are selected by step (a), these frames may be shown consecutively in sequence, or further processing may be performed to interpolate to create images at further viewpoints. Methods of performing such interpolation for the purpose of interpolating a frame rate or a missing view are known in the art. The distance of the observer may also be included in this interpolation as described in U.S. provisional patent application No. 63/121,372"Processing of extended dimension light field images [ processing of extended dimension light field images ]", filed by atkins at 12/4 of 2020.

d. Optionally, the decoded and/or interpolated frames may be further adapted according to the desired view metadata. For example, for a limited volume view with only a horizontal motion path, there is no captured viewpoint corresponding to vertical displacement. However, to simulate a certain amount of vertical movement, the image may be adjusted to be vertically displaced. Also, the image may be adapted to simulate the amount of zoom for closer or farther viewing positions. The vertical shift and scaling may be combined into an affine transformation that is applied to the image before or at the time of display of the image. In one embodiment, the transformation may also be calculated to include the difference between the expected displacement and the actual displacement of the closest image along the path of motion.

e. Finally, the selected or interpolated image may optionally be processed using dolby Color Volume Mapping (CVM) using dynamic range metadata associated with the selected frame and characteristics of a display panel of the playback device. For interpolated frames, the metadata may also be interpolated metadata. The mapped image is then displayed to the user. If the image is pre-decoded in step (b), the CVM operation may be applied to the source frame such that it may be applied only once, instead of every time the view is changed.

For a binocular 3D display, the process may be repeated for two adjacent views corresponding to the position of each eye. For auto-stereoscopic or multi-view displays, additional views may be rendered according to the capabilities of the display device to provide parallax viewing for multiple viewers and/or to achieve a smoother and faster responding playback experience.

Various embodiments will now be described with reference to several flow diagrams in the accompanying drawings. Fig. 7A illustrates a general example of a method for recording or creating a volumetric image according to one embodiment. The method may begin with recording along a path of motion in front of the scene (10A as shown in fig. 1). In operation 201, the camera may record a series of images as the camera moves along a motion path relative to the scene (e.g., record video frames when the camera is set to a video recording mode). Then in operation 203, the camera (or a device containing or coupled to the camera) may store motion path metadata about the path (as described above). The motion path metadata is then associated with a series of images in operation 205 so that the motion path metadata can be used later on in playback on a playback device to find an image or frame at a desired viewpoint. In addition, the series of images may be tagged or marked with a metadata tag (e.g., a volumetric image tag) that indicates that the series of images collectively provide the volumetric image. The series of images may then be compressed (e.g., using standard video compression methods for compressing movies) and assembled into a package with the motion path metadata and the volume label for distribution in operation 207. The method shown in fig. 7A may be performed by one device (e.g., a smart phone) or several devices (e.g., a smart phone that records the scene and stores the motion path metadata and another device such as a laptop that performs operation 207).

The package for distribution into which the series of images, the motion path metadata, and the volume label metadata may be assembled may be a data package or a file package for distribution. The package may be a single data file or a single file directory/folder.

Fig. 7B illustrates a more detailed example of a method for recording or creating a volumetric image according to one embodiment. The method may begin with recording along a path of motion in front of the scene (10A as shown in fig. 1). In operation 251, a single camera may record a series of images (e.g., video frames when the camera is set to a video recording mode) as the camera moves along a path of motion (e.g., a horizontal line) relative to the scene; in operation 251, one or more guides may be displayed to the user who is moving the camera to help the user move the camera along the motion path while recording the scene. Then in operation 253, the camera (or a device containing or coupled to the camera) may store motion path metadata about the path (as described above) and also store dynamic range metadata about images along the motion path. In operation 255, the recorded series of images may be associated with the stored motion path metadata and dynamic range metadata and the volumetric image tag; in one embodiment, operation 255 may occur near the end of the method in fig. 7B (e.g., after operation 261). In operation 257, the data processing system may adapt the series of images to correct for deviations outside the expected motion path; for example, as described above, the system may crop the image and interpolate the image to correct for vertical deviations outside of the expected horizontal motion path. In operation 259, the data processing system may adapt or smooth the displacement between the images along the motion path; for example, as described above, the system may add or remove images to smooth the displacement. In operation 261, the data processing system may update the motion path metadata based on the adaptations made in operations 257 and 259; for example, if the image is removed, moved or interpolated, this may change the position information of the final image in the video, and thus the motion path metadata should be updated so that the playback device has the correct metadata after these adaptations. The data processing system may then compress the final series of images and assemble the final series of images into a package with final motion path metadata and volume labels for distribution (e.g., through one or more networks such as the internet) in operation 263. The method shown in fig. 7B may be performed by one device (e.g., a smart phone) or several devices (e.g., a smart phone that records a scene and stores motion path metadata and another device such as a laptop that performs operations 255, 257, 259, 261, and 263).

Fig. 8A and 8B illustrate a method for displaying a volumetric image. Fig. 8A illustrates a general example of a method for displaying one or more images from a volumetric image. In operation 301, a playback device receives a volumetric image; the receiving may be through a distribution channel (e.g., the internet) or from local storage on the playback device if the playback device is also a recording device that creates the volumetric image. In operation 303, the playback device determines a desired viewpoint of the volumetric image (e.g., the user selects the viewpoint by using the user interface element in fig. 8D, or the device determines the viewpoint by a sensor on the device (e.g., sensor 355 in fig. 8C)). Then, in operation 305, the playback device determines a selected image based on the desired viewpoint of the volumetric image and the motion path metadata. Then, in operation 307, the playback device displays the selected image.

Fig. 8B illustrates a more detailed method for displaying one or more images from a volumetric image. In operation 325, the playback device receives the volumetric image; the receiving may be through a distribution channel (e.g., the internet) or from local storage on the playback device if the playback device is also a recording device that creates the volumetric image. In operation 327, the playback device may display a user interface for selecting a desired viewpoint; the user interface may be a slider or arrow displayed over an image in the volumetric image. FIG. 8D shows examples of arrows 365 and 367 on a display 363 of a playback device 361 that may be selected by a user; arrow 365 may be selected to move to the left viewpoint (which may be closer to the start of the motion path) and arrow 367 may be selected to move to the right viewpoint (which may be closer to the end of the motion path). The display 363 may display one of the images from the volumetric image while receiving a user selection of one of the arrows 365 and 367. In another embodiment, the user interface may be a touch screen that may receive a user swipe or other gesture indicating a selection of a desired viewpoint. In another embodiment, a sensor on the playback device may detect the head and the position of the head or gaze point of the eyes of the user to determine the desired gaze point; fig. 8C shows an example of a device 351 including a front camera 355 over a display 353. In one embodiment, the playback device may include a variety of ways (e.g., arrows and sensors displayed) that allow the user to select a desired viewpoint. In operation 329, the playback device determines a desired viewpoint based on the received input (e.g., user input or input to a method from a sensor such as front camera 355). Then in operation 331, the playback device determines which image to display based on the desired viewpoint from operation 329, the images available in the series of images, and the motion path metadata. This determination of the image has been described above. In some cases, it may be necessary or desirable to interpolate to create a new image from the neighboring images to provide a good match to the desired viewpoint, and this may be done in operation 333 (using the techniques described above). The playback device may then use the dynamic range metadata to adjust the image data from the selected image to match the display capabilities of the display device used on the playback device for displaying the selected image in operation 335. Finally, in operation 337, the selected image from the volumetric image is displayed on a display device of the playback device.

FIG. 9 illustrates an example of a data processing system 800 that may be used in accordance with one or more embodiments described herein. For example, the system 800 may be used to perform any of the methods described herein, such as the methods shown in fig. 7A, 7B, 8A, and 8B. The data processing system may also create a volumetric image and associated metadata for consumption by the playback system, and the data processing system may be the playback system; further, the data processing system in fig. 9 may be a system for performing a creation operation after recording by a camera. Note that while fig. 9 illustrates various components of a device, it is not intended to represent any particular architecture or manner of interconnecting the components, as such details are not pertinent to the present disclosure. It will also be appreciated that network computers and other data processing systems or other consumer electronic devices having fewer components or possibly more components may also be used with embodiments of the present disclosure. In one embodiment, the data processing system may be a smart phone or other mobile device. In one embodiment, the data processing system shown in FIG. 9 may include a camera as one of the peripheral devices (e.g., 815) such that the data processing system may record the series of images and may also collect motion path metadata while capturing the series of images; the motion path metadata may be obtained from sensors (e.g., accelerometers) as known in the art, and these sensors may also be peripheral devices in the data processing system. Such a data processing system may also associate the motion path metadata with the series of images and optionally also create a distributable package containing the series of images, the motion path metadata, and the volumetric image tag. In another embodiment, the data processing system in fig. 9 may be coupled to a camera to receive a series of images (and sensor data regarding displacement, or generate sensor data), and then the data processing system creates an association between the motion path metadata and the series of images.

As shown in fig. 9, device 800, which is one form of data processing system, includes a bus 803 coupled to microprocessor(s) 805 and ROM (read only memory) 807 and volatile RAM 809 and nonvolatile memory 811. The microprocessor(s) 805 may fetch instructions from the memories 807, 809, 811 and execute the instructions to perform the operations described above. Microprocessor(s) 805 may contain one or more processing cores. Bus 803 interconnects these various components together and also interconnects these components 805, 807, 809, and 811 to display controller and display device 813 and to peripheral devices such as input/output (I/O) devices 815 (which may be touch screens, mice, keyboards, modems, network interfaces, printers, one or more cameras and other devices known in the art). Typically, input/output devices 815 are coupled to the system through input/output controller 810. The volatile RAM (random access memory) 809 is typically implemented as Dynamic RAM (DRAM) which requires power continually in order to refresh or maintain the data in the memory.

The non-volatile memory 811 is typically a magnetic hard disk drive or a magnetic optical drive or an optical drive or DVD RAM or flash memory or other type of memory system that maintains data (e.g., large amounts of data) even after the system is powered down. Typically, the non-volatile memory 811 will also be a random access memory, but this is not required. Although FIG. 9 illustrates nonvolatile memory 811 as a local device coupled directly to the rest of the components in the data processing system, it should be understood that embodiments of the present disclosure may utilize nonvolatile memory remote from the system (e.g., a network storage device coupled to the data processing system through a network interface such as a modem, ethernet interface, or wireless network). Bus 803 may include one or more buses connected to each other through various bridges, controllers, and/or adapters, as is known in the art.

Portions of the above description may be implemented with logic circuitry, such as dedicated logic circuitry, or with a microcontroller or other form of processing core executing program code instructions. Thus, the processes taught by the discussion above may be performed with program code such as machine-executable instructions that cause a machine that executes the instructions to perform certain functions. In this context, a "machine" may be a machine (e.g., an abstract execution environment such as a "virtual machine" (e.g., a Java virtual machine), an interpreter, a common language runtime, a high-level language virtual machine, etc.) that converts intermediate form (or "abstract") instructions into processor-specific instructions, and/or electronic circuitry (e.g., a "logic circuit" implemented in transistors) disposed on a semiconductor chip (designed to execute instructions such as a general-purpose processor and/or a special-purpose processor). The processes taught by the discussion above may also be performed by electronic circuitry (as an alternative to, or in combination with, a machine) that is designed to perform the processes (or a portion thereof) without the execution of program code.

The present disclosure also relates to an apparatus for performing the operations described herein. The apparatus may be specially constructed for the required purposes, or it may comprise a general purpose device selectively activated or reconfigured by a computer program stored in the device. Such a computer program may be stored in a non-transitory computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magneto-optical disks, DRAMs (volatile), flash memory, read-only memory (ROM), RAM, EPROM, EEPROM, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a device bus.

A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). For example, non-transitory machine-readable media include read only memory ("ROM"); random access memory ("RAM"); a magnetic disk storage medium; an optical storage medium; a flash memory device; etc.

An article of manufacture may be used to store program code. The article of manufacture storing the program code may be embodied as, but is not limited to, one or more non-transitory memories (e.g., one or more flash memories, random access memories (static, dynamic or other)), optical disks, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other type of machine-readable media suitable for storing electronic instructions. The program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)), and then stored in a non-transitory memory of the client computer (e.g., DRAM or flash memory or both).

The foregoing detailed description has been presented in terms of algorithms and symbolic representations of operations on data bits within a device memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as "receiving," "determining," "sending," "terminating," "waiting," "changing," or the like, refer to the action and processes of a device, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the device's registers and memories into other data similarly represented as physical quantities within the device memories or registers or other such information storage, transmission or display devices.

The processes and displays presented herein are not inherently related to any particular apparatus or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the described operations. The required structure for a variety of these systems will be apparent from the description below. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.

Exemplary embodiments of the invention

The following text presents numbered embodiments in a format similar to the claims and it should be understood that these embodiments may be presented as claims in one or more future applications, such as one or more continuation-in or division-in applications.

Although individual embodiments are described in detail below, it should be understood that these embodiments may be combined or modified, either in part or in whole. Moreover, each of these embodiments may also be expressed as a method or a data processing system, rather than as a machine-readable medium.

Embodiment 1. A non-transitory machine readable medium storing executable program instructions which, when executed by a data processing system, cause the data processing system to perform a method comprising:

recording a series of images of a scene by a single camera as the single camera moves along a path relative to the scene;

storing first motion path metadata about the path;

associating the series of images with the first motion path metadata; and

a metadata tag is associated with the series of images, the metadata tag indicating that the recorded series of images represents a volumetric image of the scene.

Embodiment 2. The non-transitory machine-readable medium of embodiment 1, wherein the recording comprises capturing and storing images continuously over a period of time, and the capturing is performed at a predetermined frame rate for displaying video or at a rate based on the camera moving along the path.

Embodiment 3. The non-transitory machine-readable medium of embodiment 1 or 2, wherein a volumetric image comprises a set of images of the scene from different camera locations or viewpoints, and wherein the series of images, the associated first motion path metadata, and the metadata tag are assembled into a package, and wherein the series of images are compressed in the package.

Embodiment 4. The non-transitory machine-readable medium of any one of embodiments 1-3, wherein the method further comprises:

adapting the series of images to a desired motion path; and

the first motion path metadata is updated based on the adapted series of images.

Embodiment 5. The non-transitory machine-readable medium of embodiment 4, wherein the adapting is cropping at least one or more images of the series of images vertically, and wherein updating the first motion path metadata is updating the first motion path metadata based on changes in the series of images due to the vertical cropping.

Embodiment 6. The non-transitory machine-readable medium of embodiment 5, wherein the desired path of motion is along a horizontal line.

Embodiment 7. The non-transitory machine-readable medium of any one of embodiments 1 to 6, wherein the method further comprises:

one or more positions of one or more images in the series of images are adjusted to smooth one or more displacements between images in the series of images along the desired path of motion.

Embodiment 8. The non-transitory machine-readable medium of any one of embodiments 1 to 7, wherein the method further comprises:

one or more guides are displayed during the recording to guide a user moving the single camera along the desired path of motion.

Embodiment 9. The non-transitory machine readable medium of any one of embodiments 1-8, wherein the first motion path metadata indicates a displacement along the path from one image to a next image in the series of images, and wherein optionally, upon playback, the recording supports (1) displaying a single image at a desired viewpoint, and (2) displaying the series of images as a movie.

Embodiment 10. The non-transitory machine-readable medium of embodiments 1-9, wherein the method further comprises:

storing distance metadata that provides an estimate of a distance between one or more objects in the scene and the single camera;

dynamic range metadata is stored, the dynamic range metadata indicating a dynamic range for each image in a set of images in the series of images, the dynamic range for each image indicating a luminance range.

Embodiment 11. A non-transitory machine readable medium storing executable program instructions which, when executed by a data processing system, cause the data processing system to perform a method comprising:

receiving a series of images and associated motion path metadata and a volume metadata tag, the volume metadata tag indicating that the series of images represent a volume image of a scene;

determining a desired viewpoint of the volumetric image;

determining a selected image from the desired viewpoint based on the series of images;

displaying the selected image.

Embodiment 12. The non-transitory machine-readable medium of embodiment 11, wherein determining the selected image is based on a comparison of the desired viewpoint and the motion path metadata.

Embodiment 13. The non-transitory machine-readable medium of embodiment 11 or 12, wherein the series of images are recorded in a single camera during successive capturing and storing of images over a period of time along a path of motion of the single camera, and the capturing is performed at a predetermined frame rate for displaying video.

Embodiment 14. The non-transitory machine readable medium of any of embodiments 11-13, wherein a volumetric image comprises a set of images of the scene from different camera locations or viewpoints, and wherein the series of images, the associated motion path metadata, and the volumetric metadata tag are received as a package, and wherein the series of images are compressed in the package.

Embodiment 15. The non-transitory machine readable medium of any one of embodiments 11-14, wherein the motion path metadata indicates a displacement from one image to a next image in the series of images along a path used during recording of the series of images, and the motion path metadata is used at playback to select the desired viewpoint on the scene for display, and wherein at playback the recording supports (1) displaying a single image at the desired viewpoint, and (2) displaying the series of images as a movie.

Embodiment 16. The non-transitory machine-readable medium of any one of embodiments 11-15, wherein the desired point of view is determined according to one of: (1) manual user selection from a user interface, or (2) sensor-based tracking of a user's face or head, or (3) a predetermined set of one or more viewpoints provided by a content creator.

Embodiment 17. The non-transitory machine-readable medium of embodiment 16, wherein the sensor-based tracking automatically determines the desired point of view from a position of the viewer's head detected by the sensor.

Embodiment 18. The non-transitory machine-readable medium of any one of embodiments 11 to 17, wherein the method further comprises: the selected image is adapted by scaling the selected image with an affine transformation or vertically shifting the image.

Embodiment 19. The non-transitory machine-readable medium of any one of embodiments 11-18, wherein the method further comprises: receiving dynamic range metadata, the dynamic range metadata indicating a dynamic range for each image in a set of images in the series of images, the dynamic range for each image indicating a luminance range; and mapping the selected image to a dynamic range capability of a target display based on dynamic range metadata of the selected image.

Embodiment 20. The non-transitory machine-readable medium of any one of embodiments 11-19, wherein the selected image is interpolated from a set of images in the series of images, the set of images representing a match between the desired viewpoint and the motion path metadata.

Embodiment 21. The non-transitory machine readable medium of any one of embodiments 1 to 20, wherein the time at which each image in the series of images is captured is based on motion along the path, the motion detected by a device comprising the single camera.

Embodiment 22. The non-transitory machine-readable medium of embodiment 21, wherein the time at which each image in the series of images is captured is not predetermined, but is based on displacement data of the device as it moves along the path.

Embodiment 23. The non-transitory machine-readable medium of embodiment 22, wherein the time at which each image in the series of images is selected adapts the pitch of the images along the path.

Embodiment 24. An apparatus having one or more processors to perform the method of any of embodiments 1 to 23.

Embodiment 25. A method according to any of embodiments 1 to 23.

Embodiment 26. An apparatus, comprising: a camera for capturing a series of images of a scene as the device moves along a path relative to the scene; a memory coupled to the camera for storing the captured series of images; and a set of one or more sensors for generating motion path metadata indicative of a displacement along the path from one image to a next image in the series of images.

Embodiment 27. The apparatus of embodiment 26, wherein the apparatus is coupled to a processing system and the apparatus provides the captured series of images and the motion path metadata to the processing system.

Embodiment 28 the apparatus of embodiment 27 wherein the processing system performs the method of any of embodiments 1-23.

Embodiment 29. A method performed by a data processing system, the method comprising: receiving a series of images of a scene from a single camera, the series of images captured as the single camera moves along a path relative to the scene;

Storing first motion path metadata about the path;

associating the series of images with the first motion path metadata; and associating a metadata tag with the series of images, the metadata tag indicating that the series of images represents a volumetric image of the scene.

Embodiment 30. The method of embodiment 29 wherein the series of images is captured by continuously capturing and storing images over a period of time and the capturing is performed at a predetermined frame rate for displaying video or at a rate based on the camera moving along the path, and wherein a volumetric image comprises a set of images of the scene from different camera positions or viewpoints, and wherein the series of images, the associated first motion path metadata, and the metadata tag are assembled into a package, and wherein the series of images are compressed in the package.

Embodiment 31. The method of embodiment 29 or 30, wherein the method further comprises: adapting the series of images to a desired motion path; and updating the first motion path metadata based on the adapted series of images.

Embodiment 32. The method of embodiment 31 wherein the adapting is cropping at least one or more images in the series of images vertically, and wherein updating the first motion path metadata is updating the first motion path metadata based on changes in the series of images due to the vertical cropping.

Embodiment 33. The method of any of embodiments 29 to 32, wherein the method further comprises: one or more positions of one or more images in the series of images are adjusted to smooth one or more displacements between images in the series of images along the desired path of motion.

Embodiment 34. The method of any of embodiments 29 to 33, wherein the method further comprises: one or more guides are displayed during the recording to guide a user moving the single camera along the desired path of motion.

Embodiment 35 the method of any one of embodiments 29-34, wherein the first motion path metadata indicates a displacement along the path from one image to a next image in the series of images.

Embodiment 36 the method of any one of embodiments 29 to 34, wherein the method further comprises one or both of the following operations: storing distance metadata that provides an estimate of a distance between one or more objects in the scene and the single camera; and/or storing dynamic range metadata indicating a dynamic range of each image of a set of images in the series of images, the dynamic range of each image indicating a luminance range.

Aspects of the invention may be understood from the example embodiments (EEEs) enumerated below:

eee1. A non-transitory machine readable medium storing executable program instructions which, when executed by a data processing system, cause the data processing system to perform a method comprising:

storing first motion path metadata about the path;

associating the series of images with the first motion path metadata; and

EEE2. The non-transitory machine readable medium of EEE 1 wherein the recording comprises capturing and storing images continuously over a period of time, and the capturing is performed at a predetermined frame rate for displaying video or at a rate based on the camera moving along the path.

EEE3. The non-transitory machine readable medium of EEE 1 or EEE2 wherein a volumetric image comprises a set of images of the scene from different camera positions or viewpoints, and wherein the series of images, the associated first motion path metadata, and the metadata tag are assembled into a package, and wherein the series of images are compressed in the package.

EEE4. The non-transitory machine readable medium of any one of EEEs 1-3 wherein the method further comprises:

adapting the series of images to a desired motion path; and

EEE5. The non-transitory machine readable medium of EEE4 wherein the adapting is vertically cropping at least one or more images of the series of images, and wherein updating the first motion path metadata is based on changes in the series of images due to the vertical cropping.

EEE6. The non-transitory machine readable medium of EEE 5 wherein the desired path of motion is along a horizontal line.

EEE7. The non-transitory machine readable medium of any one of EEEs 1-6 wherein the method further comprises:

EEE8. The non-transitory machine readable medium of any one of EEEs 1-7 wherein the method further comprises:

EEE9. The non-transitory machine readable medium of any one of EEEs 1-8 wherein the first motion path metadata indicates a displacement along the path from one image to a next image in the series of images.

EEE10. The non-transitory machine readable medium of any one of EEEs 1-9, wherein the method further comprises one or both of:

Eee11. A non-transitory machine readable medium storing executable program instructions which, when executed by a data processing system, cause the data processing system to perform a method comprising:

determining a desired viewpoint of the volumetric image;

displaying the selected image.

EEE12. The non-transitory machine readable medium of EEE11 wherein determining the selected image is based on a comparison of the desired viewpoint and the motion path metadata.

EEE13. The non-transitory machine readable medium of EEE11 or EEE12 wherein the series of images are recorded in a single camera during successive capturing and storing of images over a period of time along a path of motion of the single camera, and the capturing is performed at a predetermined frame rate for displaying video.

EEE14. The non-transitory machine readable medium of any one of EEEs 11-13 wherein a volumetric image comprises a set of images of the scene from different camera positions or viewpoints, and wherein the series of images, the associated motion path metadata, and the volumetric metadata tag are received as a package, and wherein the series of images are compressed in the package.

EEE15. The non-transitory machine readable medium of any one of EEEs 11-14 wherein the motion path metadata indicates a displacement from one image to a next image in the series of images along a path used during recording the series of images and the motion path metadata is used at playback to select the desired viewpoint on the scene for display, and wherein at playback the recording supports (1) displaying a single image at the desired viewpoint and (2) displaying the series of images as a movie.

EEE16. The non-transitory machine readable medium of any one of EEEs 11-15 wherein the desired point of view is determined according to one of: (1) manual user selection from a user interface, or (2) sensor-based tracking of a user's face or head, or (3) a predetermined set of one or more viewpoints provided by a content creator.

EEE17. The non-transitory machine readable medium of EEE 16 wherein the sensor-based tracking automatically determines the desired point of view from a position of the viewer's head detected by the sensor.

EEE18. The non-transitory machine readable medium of any one of EEEs 11-17 wherein the method further comprises: the selected image is adapted by scaling the selected image with an affine transformation or vertically shifting the image.

EEE19. The non-transitory machine readable medium of any one of EEEs 11-18 wherein the method further comprises: receiving dynamic range metadata, the dynamic range metadata indicating a dynamic range for each image in a set of images in the series of images, the dynamic range for each image indicating a luminance range; and mapping the selected image to a dynamic range capability of a target display based on dynamic range metadata of the selected image.

EEE20. The non-transitory machine readable medium of any one of EEEs 11-19 wherein the selected image is interpolated from a set of images in the series of images that represent a match between the desired viewpoint and the motion path metadata.

EEE 21. A data processing system performing the method of any of EEEs 1 to 20.

EEE 22. A non-transitory machine readable medium having stored executable program instructions which, when executed by a data processing system, cause the data processing system to perform the method of any one of EEEs 1 to 20.

Claims

1. A method performed by a data processing system, the method comprising:

receiving, from a single camera, a series of images of a scene captured as the single camera moves along a path relative to the scene;

storing first motion path metadata about the path;

associating the series of images with the first motion path metadata; and

a metadata tag is associated with the series of images, the metadata tag indicating that the series of images represents a volumetric image of the scene.

2. The method of claim 1, wherein a volumetric image comprises a set of images of the scene from different camera positions or viewpoints.

3. The method of claim 1 or 2, wherein the capturing is performed at a rate based on movement of the camera along the path.

4. The method of claim 1 or 2, wherein the capturing is performed at a predetermined frame rate for displaying video.

5. The method of any of claims 1-4, wherein the series of images, the associated first motion path metadata, and the metadata tag are assembled into a package, and wherein the series of images are compressed in the package.

6. The method of any one of claims 1 to 5, wherein the series of images are captured and stored continuously over a period of time.

7. The method of any one of claims 1 to 6, wherein the method further comprises:

adapting the series of images to a desired motion path; and

8. The method of claim 7, wherein the adapting is vertically cropping at least one or more images of the series of images, and wherein updating the first motion path metadata is based on changes in the series of images due to the vertical cropping.

9. The method of any one of claims 1 to 8, wherein the method further comprises:

the position of one or more images in the series of images is adjusted to smooth one or more displacements between images in the series of images along the desired path of motion.

10. The method of claim 8, wherein the method further comprises:

during capturing of the series of images, one or more guides are displayed to guide a user moving the single camera along the desired motion path.

11. The method of any of claims 1 to 10, wherein the first motion path metadata indicates a displacement along the path from one image to a next image in the series of images.

12. The method of any one of claims 1 to 11, wherein the method further comprises:

storing distance metadata that provides an estimate of a distance between one or more objects in the scene and the single camera; and/or

13. A method performed by a data processing system, the method comprising:

determining a desired viewpoint of the volumetric image;

displaying the selected image.

14. The method of claim 13, wherein a volumetric image comprises a set of images of the scene from different camera positions or viewpoints.

15. The method of claim 13 or 14, wherein determining the selected image is based on a comparison of the desired viewpoint and the motion path metadata.

16. The method of any of claims 13-15, wherein the series of images is a series of images captured by a single camera as the camera moves along a path, and wherein the series of images are captured at a rate based on the camera moving along the path.

17. The method of any of claims 13 to 15, wherein the series of images is a series of images captured by a single camera as the camera moves along a path, and wherein the series of images are captured at a predetermined rate for displaying video.

18. The method of any of claims 13 to 17, wherein the series of images are recorded in a single camera during successive capturing and storing of images over a period of time along a path of motion of the single camera.

19. The method of any of claims 13 to 18, wherein the series of images, the associated motion path metadata, and the volume metadata tag are received as a package, and wherein the series of images are compressed in the package.

20. The method of any of claims 13 to 19, wherein the motion path metadata indicates a displacement from one image to a next image in the series of images along a path used during recording of the series of images, and the motion path metadata is used at playback to select the desired viewpoint on the scene for display, and wherein at playback the recording supports (1) displaying a single image at the desired viewpoint, and (2) displaying the series of images as a movie.

21. The method of any of claims 13 to 20, wherein the desired viewpoint is determined from one of: (1) manual user selection from a user interface, or (2) sensor-based tracking of a user's face or head, or (3) a predetermined set of one or more viewpoints provided by a content creator.

22. The method of claim 21, wherein the sensor-based tracking automatically determines the desired point of view from a position of a viewer's head detected by the sensor.

23. A data processing system which performs the method of any of claims 1 to 22.

24. A non-transitory machine readable medium storing executable program instructions which, when executed by a data processing system, cause the data processing system to perform the method of any one of claims 1 to 22.