WO2013081576A1

WO2013081576A1 - Capturing a perspective-flexible, viewpoint-synthesizing panoramic 3d image with a multi-view 3d camera

Info

Publication number: WO2013081576A1
Application number: PCT/US2011/062227
Authority: WO
Inventors: Henry Harlyn Baker
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2011-11-28
Filing date: 2011-11-28
Publication date: 2013-06-06
Also published as: TW201327019A

Abstract

A perspective-flexible, viewpoint-synthesizing multi-view camera for capturing a panoramic 3D image is provided. The camera includes a housing to arrange a plurality of imagers in a grid. The plurality of imagers has a plurality of focal lengths and a baseline length, the plurality of focal lengths and the baseline length selected to provide a change in disparity at a distant site that approximately matches a change in disparity at a desired viewpoint.

Description

CAPTURING A PERSPECTIVE-FLEXIBLE, VIEWPOINT-SYNTHESIZING PANORAMIC 3D IMAGE WITH A MULTI-VIEW 3D CAMERA

BACKGROUND

[0001] The capture of multi-view three-dimensional ("3D") information to provide viewers a true 3D experience from multiple viewpoints is becoming increasingly popular. Viewers at entertainment events such as sports tournaments, fashion shows, 3D movies, concerts and the like can now have access to more realistic and at times immersive viewing experiences that try to emulate the richness of the 3D information that may be captured. Traditionally, the capture of 3D information has been achieved by placing a pair of imagers (e.g., camera sensors) a specific distance apart, aligning them for correct perception, and synchronizing the capture of stereo image pairs which are then combined. Careful selection of the separation between the imagers is a critical element in conveying the intended perception. If they are too close together, then viewers (who cannot adjust the separation of their eyes) have the impression of a scene with very little depth range. If the imagers are too far apart, then the viewers see exaggerated 3D, with protruding objects, for example, appearing very long.

[0002] Cinematographers can accommodate to this by maintaining at all times viewing conditions that are identical to those of the average human: approximately 65 mm between a viewer's eyes and a central viewing angle of about 53°. Often, however, the imagers cannot be placed at the desired location for positioning the viewers for the perspective desired. In these cases, the imagers may have to be moved to another accessible position. At a greater distance, the imagers' lenses may no longer be appropriate for the desired content, and longer lenses may have to be employed to acquire the desired perspective. This displacement and enhanced focal length may disrupt the desired perception, as it may result in imagery with very little depth range. To re-establish the correct depth range, one must accommodate with a change in imager separation, for example, by doubling the separation if the capture position is twice as far from the desired perceiving position. Providing a satisfying 3D experience to viewers therefore places constraints on the relationships between view distance, focal length, and perceived range. BRIEF DESCRIPTION OF THE DRAWINGS

[0003] The present application may be more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:

[0004] FIG. 1 is a schematic diagram illustrating an example 3D capture site in which the embodiments may be implemented;

[0005] FIG. 2 is a schematic diagram illustrating the geometry required to achieve approximately the same change in disparity at an actual capture site as a desired viewpoint;

[0006] FIG. 3 is a flowchart for capturing a panoramic 3D image with a perspective- flexible, viewpoint-synthesizing multi-view camera;

[0007] FIG. 4 is a flowchart for forming a composite and panoramic 3D image in accordance to various embodiments;

[0008] FIGS. 5A-B are example schematic diagrams of a perspective-flexible, viewpoint-synthesizing multi-view camera; and

[0009] FIG. 6 is a block diagram of an example computing system for implementing a capture and synthesis module according to the present disclosure.

DETAILED DESCRIPTION

[0010] A perspective-flexible, viewpoint-synthesizing multi-view camera, method, and non-transitory computer readable medium for capturing a panoramic 3D image are disclosed. As generally described herein, a multi-view camera has multiple imagers (e.g., image sensors) to capture multiple views of a scene, with each imager capturing a different view. The multi-view camera may also have multiple lenses integrated with the imagers. The multi-view camera, as described in more detail below, includes multiple imagers and lenses positioned in a line or a grid to capture the multiple views. A capture and synthesizing module connected to the camera receives the multiple views and synthesizes them into a composite and panoramic 3D image.

[0011] In various embodiments, a perspective-flexible, viewpoint-synthesizing multi-view camera is able to provide multiple perspectives of an image scene to viewers at multiple viewpoints while positioned at a location away from the viewpoints. The multi- view camera provides the appearance of adaptive repositioning for different perspectives without any actual physical movement. Adaptive repositioning is achieved by approximately matching the change in disparity (i.e., the difference in projected position of some feature as seen from two eyes) with respect to distance (i.e., selecting two features with a certain range separation) at the actual camera position to the change in disparity that would be observed if the camera were positioned at the desired viewpoints.

[0012] The change in disparity at the actual camera position is computed at the capture and synthesizing module by taking into account the geometrical relationship between binocular disparity, the multi-view camera baseline, the focal length of the lenses used in the multi-view camera, and the range of distances from the desired viewpoints to the image scene being captured. As generally described herein, the binocular disparity refers to the difference in image location perceived by the left and right eyes of a viewer as a result of the eye's horizontal separation. The baseline of the multi-view camera refers to the horizontal distance between the centers of projection of the farthest-apart imagers in the camera.

[0013] It is appreciated that embodiments described herein below may include various components and features. Some of the components and features may be removed and/or modified without departing from a scope of the perspective-flexible, viewpoint- synthesizing multi-view camera, method, and non-transitory computer readable medium for capturing a panoramic 3D image. It is also appreciated that, in the following description, numerous specific details are set forth to provide a thorough understanding of the embodiments. However, it is appreciated that the embodiments may be practiced without limitation to these specific details. In other instances, well known methods and structures may not be described in detail to avoid unnecessarily obscuring the description of the embodiments. Also, the embodiments may be used in combination with each other.

[0014] Reference in the specification to "an embodiment," "an example" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least that one example, but not necessarily in other examples. The various instances of the phrase "in one embodiment" or similar phrases in various places in the specification are not necessarily all referring to the same embodiment. As used herein, a module may be a combination of hardware and software executing on that hardware to provide a given functionality.

[0015] Referring now to FIG. 1, a schematic diagram illustrating an example 3D capture site in which the embodiments may be implemented is described. 3D capture site 100 represents a site where 3D information is captured for multiple viewers at multiple viewpoints. Multi-view cameras 105a-c are positioned at a given location away from and according to the 3D information available at a distant site 1 10. The distant site 110 may be, for example, a sports field 110a, a concert stage 1 10b, and a movie scene 1 10c, among others. Multiple viewers 1 15a-g may be positioned at multiple viewpoints away from the distant site 1 10 that are likely different from the actual position of the multi-view cameras 105a-c and nearer to the distant site 110 than the multi-view cameras 105a-c.

[0016] A capture and synthesis module 120 connected to the multi-view cameras 105a-c receives the multiple views captured by the multi-view cameras 105a-c and synthesizes them into a composite and panoramic 3D image 125. The composite and panoramic 3D image 125 may be presented to viewers at a display site (e.g., a movie theater, concert backstage, etc.) that may be different or the same as the capture site 100. In various embodiments, the composite and panoramic 3D image 125 is generated so that when it is presented to viewers at the display site, the viewers have the impression of being at the capture site 100 at the right perspective.

[0017] This is achieved by having the multi-view cameras 105a-c capture multiple views in such a way as to produce a perception as though the cameras were positioned at the viewers' position. Doing so requires the change in disparity with respect to distance at the actual position of the multi-view cameras 105a-c be approximately the same as the change in disparity at a desired position (e.g., at a viewer's 1 15a-e position). As described below for various embodiments, the capture and synthesizing module 120 computes the change in disparity for a desired viewer at the desired position and then determines the baseline length and focal lengths to be used at the multi-view cameras 105a-c when capturing images at the distant site 1 10.

[0018] It is appreciated that although three multi-view cameras 105a-c are shown in FIG. 1 , fewer or more such multi-view cameras may be used. It is also appreciated that the multiple 3D sites HOa-c are shown side-by-side as distant site 1 10 for illustration purposes only. One skilled in the art appreciates that distant site 110 is in fact a single site (e.g., sports field 110a or concert stage 110b or movie scene 1 10c) where 3D information is captured.

[0019] Attention is now directed to FIG. 2, which illustrates the geometry required to achieve approximately the same change in disparity at an actual capture site as a desired viewpoint. Viewpoints 200a-b may represent the position of a viewer's eyes observing 3D information at a distant site 205. The goal is to approximately match the change in disparity experienced at one or more multi-view cameras positioned away from the viewpoints 200a-b to the change in disparity observed by a viewer at viewpoints 200a-b. To do so, the change in disparity for a viewer at viewpoints 200a-b can be calculated by taking into account the geometrical relationship between the binocular disparity 210, the baseline 215 separation between the viewer's eyes (e.g.,- 65 mm), the focal length 220 of the viewer's eyes (e.g., ~ 17-22 mm), and the range of distances 225 from the viewpoints 200a-b to the distant site 205.

[0020] Let d represent the binocular disparity 210, b represent the baseline 215, / represent the focal length 220, and z represent the range of distances 225 from the viewpoints 200a-b to the distant site 205. Then the geometrical relationship between them can be given by: = - (Eq- ¹) b z

Taking the derivative over the range of distances z 225 results in: d_zd = - (Eq. 2) z

The change of disparity relative to distance Ad/Az observed by a viewer at viewpoints 200a-b can therefore be determined as follo

[0021] As appreciated by one skilled in the art, the change in disparity Ad in Eq. 3 above can be computed for a desired range of distances z. If a multi-view camera is positioned away from the desired range of distances z at a distance z_c, then Eq. 3 can be used to select a baseline length b_c for the camera and a focal length f_c for the lenses used in the camera to achieve approximately the same change of disparity Ad obtained in Eq. 3. In one embodiment, the focal length f_c can be selected to capture the context (e.g., sports game, concert, movie scene, etc.) at the distant site 205. The baseline length b_c for the camera can then be computed as below to achieve approximately the same change of disparity Ad at the desired viewpoints:

In another embodiment, the baseline length b_c and the focal length f_c may be selected together to achieve approximately the same Ad as obtained in Eq. 3. [0022] It is appreciated that approximately matching the change in disparity at the actual camera position to the change in disparity at a desired viewpoint improves the veracity and the roundness of the 3D information that is captured. Roundness describes the degree to which a viewed shape is perceived isotropically in 3D, that is, if a sphere is seen shaped spherically or has unintended spatial view depth distortions. When 3D images are not acquired with this perceptual characteristic of roundness in mind, human faces can look like pancakes or noses can become elongated. Additionally, mismatched focal lengths and baselines may induce impressions of gigantism or miniaturism, where figures in the foreground or background are perceived with an inappropriate size.

[0023] It is also appreciated by those skilled in the art that there is a tradeoff in selecting a baseline for preferred human viewing of 3D image data in that the perspective effects of viewing from a distance cannot be altered in an image-based approach, once captured. This means that a representation of the 3D image data for viewing at some other distance can be set up for a preferred "roundness," but may continue to depict objects as seen from the originating perspective. The result is that the usual projective relationships may deliver conflicting impressions as regards to relative size. To accommodate to this, the discussion presented herein with respect to baseline and focal length selection may be augmented to include a consideration of minimizing the impact of perspective effects. In one example, a guiding metric may be provided such that the resulting size variance may not exceed, say, 10% of the desired true perspective, and the baseline deployed may be selected so as to provide an acceptable roundness with respect to that constraint.

[0024] Referring now to FIG. 3, a flowchart for capturing a panoramic 3D image with a perspective- flexible, viewpoint-synthesizing multi-view camera is described. First, the multi-view camera is positioned a distance away from the image scene (e.g., distant site 1 10) to be captured (300). Next, a baseline length for the camera is selected to achieve a change in disparity corresponding to a viewer at a desired viewpoint (305). The change in disparity is first calculated for a viewer at the desired viewpoint as in Eq. 3 above. The baseline length of the camera (and optionally, the focal length of the lenses used in the camera) is then selected to approximately match the change in disparity as in Eq. 4. Finally, images are captured with the selected baseline length to form a composite panoramic 3D image (310).

[0025] It is appreciated that selecting the baseline length and the focal length of lenses of a multi-view camera to approximately match the change in disparity experienced by the camera to the change in disparity at desired viewpoints enables a director of production or other orchestrator of streaming/recording 3D video to have the multi-view camera positioned at one location yet deliver 3D perceptions to viewers that appear to be from another location (i.e., desired viewpoints). The goal is to allow placement of multi-view 3D cameras away from their ideal positions and provide the appearance of adaptive repositioning but without any physical movement.

[0026] It is also appreciated that the selection of baseline length and focal length of lenses of a multi-view camera can be delegated to a module (e.g., capture and synthesis module 120) rather than an active human operator, thereby reducing cost and increasing flexibility. Instead of having a human operator physically move a 3D camera such as, for example, by moving a boom stand, and changing lenses (e.g., a zoom lens) to provide different perspectives at desired viewpoints, the perception of physical camera movement can be provided by selecting a baseline length and focal length for the multi-view camera to approximately match a change in disparity from the desired viewpoints. The selected baseline length and focal length can then be used to capture a plurality of images that may then be synthesized to form a composite and panoramic 3D image.

[0027] Referring now to FIG. 4, a flowchart for forming a composite and panoramic 3D image in accordance to various embodiments is described. First, a change in disparity for a viewer at a desired viewpoint is computed at a capture and synthesis module connected to a multi-view camera (400). Next, a distance between an actual position of the multi-view camera and a site to be captured is determined (405). A baseline length and a plurality of focal lengths for the multi-view camera are then selected to achieve approximately the same change in disparity for the viewer at the desired viewpoint (410). Lastly, a plurality of images captured using the selected baseline length and the plurality of focal lengths for the multi-view 3D camera are received by the capture and synthesis module connected to the camera (415). The capture and synthesis module then synthesizes the plurality of images to form a composite and panoramic 3D image.

[0028] It is appreciated that the steps illustrated in FIG. 4 can be performed by a capture and synthesis module (e.g., capture and synthesis module 120) connected to a multi- view camera. An example of such a multi-view camera is illustrated in FIG. 5A. Multi-view camera 500 has a plurality of imagers 505 arranged in a grid of a camera housing 510. Each imager may be an image sensor with a given focal length. The baseline 515 of the multi-view camera 500 is represented by the horizontal distance between the farthest-apart imagers in the camera, e.g., imagers 520 and 525.

[0029] Another example multi-view camera is illustrated in FIG. 5B. Multi-view camera 530 has a plurality of imagers 535 arranged in a grid of a camera housing 540. The imagers are interspersed with a plurality of lenses (e.g., zoom lenses, telephoto lenses, etc.) having a plurality of focal lengths, such as, for example, lens 545. The plurality of focal lengths can be selected to capture the context (e.g., sports game, concert, movie scene, etc.) of the scene being captured. In one embodiment, the lenses and respective focal lengths can be selected a priori for the given context, and the capture and synthesis module connected to the camera 535 can be used to select the desired baseline length for the camera.

[0030] It is appreciated that the multi-view cameras illustrated in FIGS. 5A-B operate with a capture and synthesis module (e.g., capture and synthesis module 120) to select the baseline length and focal lengths for the cameras at a given capture site. As such, these multi-view cameras can be referred to as perspective-flexible, viewpoint-synthesizing cameras as they are able to capture different perceptions and synthesize images from different viewpoints without any physical movement.

[0031] As also appreciated by one skilled in the art, the capture and synthesis module connected to the multi-view camera (e.g., cameras 500 and 535) can be implemented in hardware, software, or a combination of both. Referring now to FIG. 6, a computing system for implementing the capture and synthesis module according to the present disclosure is described. The computing system 600 can include a processor 605 and memory resources, such as, for example, the volatile memory 610 and/or the non-volatile memory 615, for executing instructions stored in a tangible non-transitory medium (e.g., volatile memory 610, non-volatile memory 615, and/or computer readable medium 620) and/or an application specific integrated circuit ("ASIC") including logic configured to perform various examples of the present disclosure.

[0032] A machine (e.g., a computing device) can include and/or receive a tangible non-transitory computer-readable medium 620 storing a set of computer-readable instructions (e.g., software) via an input device 625. As used herein, the processor 605 can include one or a plurality of processors such as in a parallel processing system. The memory can include memory addressable by the processor 605 for execution of computer readable instructions. The computer readable medium 620 can include volatile and/or non-volatile memory such as a random access memory ("RAM"), magnetic memory such as a hard disk, floppy disk, and/or tape memory, a solid state drive ("SSD"), flash memory, phase change memory, and so on. i some embodiments, the non- volatile memory 615 can be a local or remote database including a plurality of physical non-volatile memory devices.

[0033] The processor 605 can control the overall operation of the computing system 600. The processor 605 can be connected to a memory controller 630, which can read and/or write data from and/or to volatile memory 610 (e.g., RAM). The memory controller 630 can include an ASIC and/or a processor with its own memory resources (e.g., volatile and/or nonvolatile memory). The volatile memory 610 can include one or a plurality of memory modules (e.g., chips). The processor 605 can be connected to a bus 635 to provide communication between the processor 605, the network connection 640, and other portions of the computing system 600. The non-volatile memory 615 can provide persistent data storage for the computing system 600. Further, the graphics controller 645 can connect to an optional display 650.

[0034] Each computing system 600 can include a computing device including control circuitry such as a processor, a state machine, ASIC, controller, and/or similar machine. As used herein, the indefinite articles "a" and/or "an" can indicate one or more than one of the named object. Thus, for example, "a processor" can include one or more than one processor, such as in a parallel processing arrangement.

[0035] The control circuitry can have a structure that provides a given functionality, and/or execute computer-readable instructions that are stored on a non-transitory computer- readable medium (e.g., the non-transitory computer-readable medium 620). The non- transitory computer-readable medium 620 can be integral, or communicatively coupled, to a computing device, in either a wired or wireless manner. For example, the non-transitory computer-readable medium 620 can be an internal memory, a portable memory, a portable disk, or a memory located internal to another computing resource (e.g., enabling the computer-readable instructions to be downloaded over the Internet).

[0036] The non-transitory computer-readable medium 620 can have computer- readable instructions 655 stored thereon that are executed by the processor 605 to implement a capture and synthesis module 660 according to the present disclosure. The non-transitory computer-readable medium 620, as used herein, can include volatile and/or non-volatile memory. Volatile memory can include memory that depends upon power to store information, such as various types of dynamic random access memory ("DRAM"), among others. Non-volatile memory can include memory that does not depend upon power to store information. Examples of non-volatile memory can include solid state media such as flash memory, EEPROM, and phase change random access memory ("PCRAM"), among others. The non-transitory computer-readable medium 620 can include optical discs, digital video discs ("DVD"), Blu-Ray Discs, compact discs ("CD"), laser discs, and magnetic media such as tape drives, floppy discs, and hard drives, solid state media such as flash memory, EEPROM, PCRAM, as well as any other type of computer-readable media.

[0037] It is appreciated that the previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. For example, it is appreciated that the present disclosure is not limited to a particular configuration, such as computing system 600.

[0038] Those of skill in the art would further appreciate that the various illustrative modules and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. For example, the example steps of FIG. 4 may be implemented using software modules, hardware modules or components, or a combination of software and hardware modules or components. Thus, in one embodiment, one or more of the example steps of FIG. 4 may comprise hardware modules or components, i another embodiment, one or more of the steps of FIG. 4 may comprise software code stored on a computer readable storage medium, which is executable by a processor.

[0039] To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, and steps have been described above generally in terms of their functionality (e.g., the capture and synthesis module 660). Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

Claims

WHAT IS CLAIMED IS:

1. A perspective-flexible, viewpoint-synthesizing multi-view camera for capturing a panoramic 3D image, comprising:

a housing to arrange a plurality of imagers in a grid; and

a plurality of imagers having a plurality of focal lengths and a baseline length, the plurality of focal lengths and the baseline length selected to provide a change in disparity at a distant site that approximates a change in disparity from a desired viewpoint.

2. The perspective-flexible, viewpoint-synthesizing multi-view camera of claim 1, wherein the change in disparity at the distant site is adjusted by considering perspective size variations.

3. The perspective-flexible, viewpoint-synthesizing multi-view camera of claim 1, wherein the plurality of imagers comprises a plurality of image sensors.

4. The perspective-flexible, viewpoint-synthesizing multi-view camera of claim 3, wherein the plurality of image sensors are interspersed with a plurality of lenses.

5. The perspective-flexible, viewpoint-synthesizing multi-view camera of claim 1, wherein the change in disparity at a desired viewpoint is computed for a viewer at the desired viewpoint.

6. The perspective-flexible, viewpoint-synthesizing multi-view camera of claim 1, wherein the change in disparity at a desired viewpoint is computed according to a distance between the viewpoint and the distant site.

7. The perspective-flexible, viewpoint-synthesizing multi-view camera of claim 1, wherein the baseline length and the plurality of focal lengths are selected according to a distance between the distance site and a position of the multi-view camera.

8. A method for capturing a perspective-flexible, viewpoint-synthesizing multi-view panoramic 3D image, the method comprising:

positioning a multi-view 3D camera at a distance away from a distant site, the multi- view 3D camera having a plurality of imagers with a plurality of focal lengths arranged in a grid;

selecting a baseline length to achieve a change in disparity at the distance away from the distance site that approximately matches a change in disparity from a desired viewpoint; and

capturing a plurality of images with the selected baseline length to form the panoramic 3D image.

9. The method of claim 8, further comprising adjusting the change in disparity at the distance away from the distance site to consider perspective size variations.

10. The method of claim 8, wherein selecting the baseline length comprises determining the change in disparity corresponding to a desired viewpoint.

11. The method of claim 10, wherein determining the change in disparity corresponding to a desired viewpoint comprises determining a baseline separation between a viewer's eyes, a focal length of the viewer's eyes, and a distance between the desired viewpoint and the camera position.

12. The method of claim 8, wherein the baseline length is a function of the distance between the camera and the distant site.

13. A non- transitory computer readable medium having instructions stored thereon executable by a processor to:

compute a change in disparity for a viewer at a desired viewpoint;

determine a distance between a position of a multi-view 3D camera and a distant site; select a baseline length and a plurality of focal lengths for the multi-view 3D camera to achieve a change in disparity at the distant site that approximately matches the change in disparity for the viewer at the desired viewpoint; and receive a plurality of images captured using the selected baseline length and the plurality of focal lengths for the multi-view 3D camera positioned at the distance away from the distance site.

14. The non- transitory computer readable medium of claim 13, further comprising instructions to adjust the change in disparity at the distant site by considering perspective size variations.

15. The non- transitory computer readable medium of claim 13, wherein the instructions to compute a change in disparity for a viewer comprise instructions to determine a baseline separation between the viewer's eyes, a focal length of the viewer's eyes, and a distance between the desired viewpoint and the camera position.

16. The non- transitory computer readable medium of claim 13, wherein the baseline length and the plurality of focal lengths are selected according to the distance between a position of the multi-view 3D camera and the distant site.

17. The non-transitory computer readable medium of claim 13, further comprising instructions to synthesize the plurality of images to form a panoramic 3D image.