US20180205941A1 - Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality - Google Patents

Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality Download PDF

Info

Publication number
US20180205941A1
US20180205941A1 US15/489,503 US201715489503A US2018205941A1 US 20180205941 A1 US20180205941 A1 US 20180205941A1 US 201715489503 A US201715489503 A US 201715489503A US 2018205941 A1 US2018205941 A1 US 2018205941A1
Authority
US
United States
Prior art keywords
depth
images
panorama
generate
warped
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/489,503
Other versions
US10038894B1 (en
Inventor
Johannes Peter Kopf
Lars Peter Johannes Hedman
Richard Szeliski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meta Platforms Inc
Original Assignee
Facebook Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Facebook Inc filed Critical Facebook Inc
Priority to US15/489,503 priority Critical patent/US10038894B1/en
Assigned to FACEBOOK, INC. reassignment FACEBOOK, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEDMAN, LARS PETER JOHANNES, SZELISKI, RICHARD, KOPF, JOHANNES PETER
Priority to PCT/US2017/031839 priority patent/WO2018136106A1/en
Priority to CN201780083870.1A priority patent/CN110192222B/en
Priority to EP17180515.3A priority patent/EP3349176B1/en
Priority to US16/018,061 priority patent/US20180302612A1/en
Publication of US20180205941A1 publication Critical patent/US20180205941A1/en
Publication of US10038894B1 publication Critical patent/US10038894B1/en
Application granted granted Critical
Assigned to META PLATFORMS, INC. reassignment META PLATFORMS, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: FACEBOOK, INC.
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • H04N13/0282
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • G06T3/0093
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • H04N13/0022
    • H04N13/0037
    • H04N13/0221
    • H04N13/026
    • H04N13/0271
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/15Processing image signals for colour aspects of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/207Image signal generators using stereoscopic image cameras using a single 2D image sensor
    • H04N13/221Image signal generators using stereoscopic image cameras using a single 2D image sensor using the relative movement between cameras and objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • This disclosure relates generally to rendering three-dimensional images and more specifically to reconstructing a three-dimensional scene from a set of two-dimensional images.
  • a method, non-transitory computer-readable storage medium, and image reconstruction system generates a three-dimensional image from a plurality of two-dimensional input images.
  • a plurality of input images of a scene is received in which the input images are taken from different vantage points.
  • the input images may have varying amounts of overlap with each other and varying camera orientations.
  • the plurality of input images is processed to generate a sparse reconstruction representation of the scene.
  • the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene.
  • the plurality of input images is processed to generate respective dense reconstruction representations of each of the plurality of input images.
  • each of the respective dense reconstruction representations include a respective depth image for a corresponding input image in which the depth image includes both color and depth information.
  • Front surfaces of the depth images are projected using a forward depth test to generate a plurality of front-warped images.
  • Back surfaces of the depth images are projected using an inverted depth test to generate a plurality of back-warped images.
  • the front-warped images and the back-warped images are stitched to generate a two-layer panorama having a front surface panorama and a back surface panorama.
  • the front surface panorama and the back surface panorama are then fused to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
  • Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, wherein any feature mentioned in one claim category, e.g. method, can be claimed in another claim category, e.g. system, as well.
  • the dependencies or references back in the attached claims are chosen for formal reasons only. However any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination of claims and the features thereof is disclosed and can be claimed regardless of the dependencies chosen in the attached claims.
  • a method comprises:
  • Processing the plurality of input images to generate the sparse reconstruction representation may comprise:
  • Processing the plurality of input images to generate the respective dense reconstruction representations may comprise:
  • generating a near envelope prior that assigns a cost to estimated depth values in front of a near envelope; and applying a multi-view stereo processing algorithm to estimate the depth values based on a cost function including the near envelope prior.
  • Generating the near envelope prior may comprise:
  • identifying anchor pixels in the plurality of input images that have high confidence depth estimates propagating the depth estimates of the anchor pixels to other pixels in the plurality of input images to generate approximate depth maps; filtering the approximate depth maps to determine a near envelope; and generating the near envelope prior based on the depth estimates and the near envelope.
  • Stitching the front-warped images and the back-warped images to generate a two-layer panorama may comprise:
  • stitching a depth panorama using depth values from the front-warped images stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama; stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and combining the front surface panorama and the back surface panorama into the two-layer panorama.
  • Fusing the front surface panorama and the back surface panorama may comprise:
  • a method may comprise:
  • the normal map estimating for each pixel, an angle normal to a surface depicted by the pixel.
  • Generating the normal map may comprise:
  • the plurality of input images may have varying levels of overlap and orientation changes.
  • a non-transitory computer-readable storage medium may store instructions, the instructions when executed by a processor may cause the processor to perform steps including:
  • Processing the plurality of input images to generate the sparse reconstruction representation may comprise:
  • Processing the plurality of input images to generate the respective dense reconstruction representations may comprise:
  • generating a near envelope prior that assigns a cost to estimated depth values in front of a near envelope; and applying a multi-view stereo processing algorithm to estimate the depth values based on a cost function including the near envelope prior.
  • Generating the near envelope prior may comprise:
  • identifying anchor pixels in the plurality of input images that have high confidence depth estimates propagating the depth estimates of the anchor pixels to other pixels in the plurality of input images to generate approximate depth maps; filtering the approximate depth maps to determine a near envelope; and generating the near envelope prior based on the depth estimates and the near envelope.
  • Stitching the front-warped images and the back-warped images to generate a two-layer panorama may comprise:
  • stitching a depth panorama using depth values from the front-warped images stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama; stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and combining the front surface panorama and the back surface panorama into the two-layer panorama.
  • Fusing the front surface panorama and the back surface panorama may comprise:
  • the instructions when executed by processor further may cause the processor to perform steps including:
  • the normal map estimating for each pixel, an angle normal to a surface depicted by the pixel.
  • Generating the normal map may comprise:
  • the plurality of input images may have varying levels of overlap and orientation changes.
  • a system may comprise:
  • a processor and a non-transitory computer-readable storage medium storing instruction for generating a three-dimensional image, the instructions when executed by a processor causing the processor to perform steps including: receiving a plurality of input images of a scene taken from different vantage points; processing the plurality of input images to generate a sparse reconstruction representation of the scene, the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene; based in part on the sparse reconstruction representation, processing the plurality of input images to generate respective dense reconstruction representations of each of the plurality of input images, the respective dense reconstruction representations each including respective depth images for the plurality of input images, the depth images including color and depth information; projecting front surfaces of the depth images using a forward depth test to generate a plurality of front-warped images; projecting back surfaces of the depth images using an inverted depth test to generate a plurality of back-warped images; stitching the front-warped images and the back-
  • Stitching the front-warped images and the back-warped images to generate a two-layer panorama may comprise:
  • stitching a depth panorama using depth values from the front-warped images stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama; stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and combining the front surface panorama and the back surface panorama into the two-layer panorama.
  • one or more computer-readable non-transitory storage media may embody software that is operable when executed to perform a method according to the invention or any of the above mentioned embodiments.
  • a system may comprise: one or more processors; and at least one memory coupled to the processors and comprising instructions executable by the processors, the processors operable when executing the instructions to perform a method according to the invention or any of the above mentioned embodiments.
  • a computer program product preferably comprising a computer-readable non-transitory storage media, may be operable when executed on a data processing system to perform a method according to the invention or any of the above mentioned embodiments.
  • FIG. 1 is a block diagram illustrating an example embodiment of a system for generating a three-dimensional image from a plurality of two-dimensional input images.
  • FIG. 2 is a flowchart illustrating an embodiment of a process for generating a sparse reconstruction representation of a scene.
  • FIG. 3 is a flowchart illustrating an embodiment of a process for generating a dense reconstruction representation of a scene.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for fusing a plurality of depth images into a multi-layer panoramic image.
  • FIG. 5 is a flowchart illustrating an embodiment of a process for generating a normal map for a multi-layer panoramic image.
  • a graphics system reconstructs a three-dimensional scene from a set of images of the scene taken from different vantage points.
  • the system processes each image to extract depth information therefrom and then stitches the images (both color and depth information) into a multi-layered panorama that includes at least front and back surface layers.
  • the front and back surface layers are then merged to remove redundancies and create connections between neighboring pixels that are likely to represent the same object, while removing connections between neighboring pixels that are not.
  • the resulting layered panorama with depth information can be rendered using a virtual reality (VR) system, a mobile device, or other computing and display platforms using standard rendering techniques, to enable three-dimensional viewing of the scene.
  • VR virtual reality
  • FIG. 1 illustrates a system for reconstructing a three-dimensional scene from a set of images, in accordance with one embodiment.
  • an image capture system 110 e.g., a camera
  • the three-dimensional photo reconstruction system 120 processes the images 115 to generate a three-dimensional renderable panoramic image 125 .
  • the three-dimensional renderable panoramic image 125 is outputted to a three-dimensional renderer 130 that renders a three-dimensional image for display.
  • additional intermediate components may be included that are not expressly shown in FIG. 1 .
  • an image storage (e.g., a database) may be included between the image capture system 110 and the three-dimensional photo reconstruction system 120 to store images 115 until selected for processing.
  • an image storage may exist between the three-dimensional photo reconstruction system 120 and the three-dimensional renderer 130 to store the renderable panoramic image 125 .
  • conventional networking components may facilitate communications between the various systems 110 , 120 , 130 , any intermediate components (e.g., storage systems), or different individual modules of the 3D photo reconstruction system 120 described below.
  • Each of the illustrated systems 110 , 120 , 130 may include one or more processors and a computer-readable storage medium that stores instructions that when executed cause the respective systems to carry out the processes and functions attributed to the systems 110 , 120 , 130 described herein.
  • systems 110 , 120 , 130 or portions thereof may operate on a client computer device, one a cloud server, on an enterprise server, or a combination thereof.
  • the 3D photo reconstruction system 120 operates on one or more cloud servers associated with a social networking system or other online system.
  • a user having an account with the social networking system may upload a set of photos to the social networking system as the input images 115 .
  • the 3D photo reconstruction system 120 may operate on a cloud server to generate the 3D image and provide it to the social networking system for rendering to viewers (e.g., as a profile image or post on the user's profile page).
  • the image capture system 110 may comprise any system capable of taking a set of images of a scene, such as a standalone consumer camera or a camera built into a phone or other mobile device. Such cameras may include, for example, a digital single-lens reflex (DSLR) camera, a 360 degree camera, or other wide field of view camera.
  • DSLR digital single-lens reflex
  • the image capture system 110 may also include a camera that is wearable on a human, such as outward facing cameras in a virtual reality (VR) or augmented reality (AR) headset.
  • the image capture system 110 may capture individual still images or video of the scene.
  • the image capture system 110 may capture the set of images in an unstructured manner. For example, the particular positions from which each image is captured do not need to be at precise known positions and the images may have varying amounts of overlap or differences in orientation. Furthermore, the number of images captured may be arbitrary within reasonable upper and lower bounds.
  • the set of images may be taken by moving the image capture system 110 sideways (e.g., at half arm's length) while taking a series of still images.
  • the images may be taken by multiple different cameras positioned at different locations.
  • at least some of the images may overlap with one or more other images.
  • the images captured by the image capture system 110 may be captured quickly in a simple user-friendly manner without requiring special equipment or specialized knowledge about how to position the camera.
  • the image capture system 110 may also include software coupled to the camera that assists the user in capturing images, such as by guiding the user to take a sufficient number of images to capture the scene.
  • the software may be embodied, for example, as instructions on a non-transitory computer-readable storage medium of a camera, client computer device, or on a cloud server communicatively coupled to the camera.
  • the software may use motion or position sensors in the image capture system to record approximate relative positions of the capture for each image and possibly to help guide the user when capturing the images.
  • simultaneous localization and mapping (SLAM) technology may be applied to determine the user's estimated location as the user moves through the scene or moves the image capture system 110 around.
  • SLAM simultaneous localization and mapping
  • a fast panoramic image preview (e.g., a 360 degree image) may be generated and made accessible to the user on the image capture system 110 or other connected device (e.g., any device suitable for displaying image content) to enable the user to see a preview of the views captured at any point during image capture. This may be useful to help guide the user to capture additional views that may be missing from the panoramic image.
  • the image capture system 110 may perform content or face recognition on the captured images to identify faces or other objects of interest.
  • the image capture system 110 may then provide real-time feedback to the user during capture to alert the user if a face or object of interest is detected but insufficient views of that face or objected have been captured. Thus, the image capture system 110 may guide the user to ensure that an adequate number of images of faces or objects of interest are obtained so that they may be effectively rendered in the three-dimensional reconstruction.
  • the image capture system 110 may furthermore include software for selecting a set of images that are suitable for processing by the three-dimensional reconstruction system 120 from a larger collection of images. For example, in one embodiment, images that are intended for reconstructing into a three-dimensional renderable image may be tagged with metadata to indicate that they were captured as part of an image capture sequence specifically for this purpose. Alternatively, the image capture system 110 may selectively determine images that are suitable for reconstruction based on a number of criteria, such as being captured within a time threshold or within physical proximity to a certain location. The images may be selected from discrete images or from frames of video captured by the image capture system 110 .
  • the image capture system 110 may include multiple cameras that may be operated by separate users to capture images of a scene.
  • the images may be stored to a common database and may be processed to select suitable images in the same way that images from a single camera may be processed.
  • multiple users may upload images so that several users can share images captured by others and reconstruct the scene using the shared images.
  • social connections may be used to determine which images are available to others. For example, a user may set a privacy setting to indicate which other users can access and use their images, which may be set based on social connections between the users.
  • the image capture system 110 may capture video suitable for three-dimensional reconstruction in which at each time instance, a frame of video is captured from multiple vantage points in the same manner described above. Reference to “images” herein, may therefore also correspond to video frames.
  • one or more pre-processing tasks may be performed on the images.
  • the images may be transcoded into a predefined format and file quality.
  • the set of images is then provided to the three-dimensional (3D) photo reconstruction system 120 , which converts the set of images into a three-dimensional renderable panoramic image with multiple layers and depth information.
  • the three-dimensional photo reconstruction system 120 includes several functional modules for performing this conversion task, including a sparse reconstruction module 122 , a dense reconstruction module 124 , a two-layer fusion module 126 , and a normal map estimation module 128 .
  • the sparse reconstruction module 122 processes the input images 115 to generate a sparse reconstruction representation of the scene.
  • the dense reconstruction module 124 then uses the sparse reconstruction representation to generate depth maps for each of the input images 115 .
  • the dense reconstruction module 124 may apply a modified multi-view stereo algorithm that results in improved depth estimations relative to conventional algorithms executing in this context.
  • the depth images including the original set of images and their corresponding depth maps are provided to the two-layer fusion module 126 .
  • the two-layer fusion module 126 merges the depth images into a multi-layer panorama comprising, for example, a two-layer panoramic mesh that is renderable by the three-dimensional renderer 130 .
  • the normal map estimation module 128 estimates normals for surfaces of the two-layer panoramic mesh that can be used in by the renderer 130 to add various effects such as lighting or flooding effects to the rendered image. Processes performed by each of these modules are described in further detail below.
  • the three-dimensional renderer 130 may comprise, for example, a virtual reality (VR) system, a mobile device, or other computing and display platforms that can display rendered three-dimensional content using standard rendering techniques.
  • VR virtual reality
  • the above-described modules can operate on a set of images corresponding to an individual time instance to reconstruct a three-dimensional image, or can operate on a sequence of image sets, with each of the image sets have a set of video frames from different videos captured from different vantage points.
  • the sparse reconstruction module 122 , dense reconstruction module 124 , two-layer fusion module 126 , normal map estimation module 128 , and renderer 130 may then operate to generate a three-dimensional frame from each image set.
  • the sequence of three-dimensional frames may be stored in sequence as a three-dimensional video of the scene.
  • FIG. 2 illustrates an embodiment of a process performed by the sparse reconstruction module 122 for generating a sparse reconstruction representation from a plurality of input images.
  • a plurality of input images are received 202 .
  • the input images comprise two-dimensional images that may be pre-selected as images that are suitable for reconstruction into the three-dimensional image.
  • the input images include a set of images of a scene captured from different vantage points, with at least some of the input images overlapping other images of the scene.
  • the sparse reconstruction module applies 204 a structure-from-motion algorithm to the input images that reconstructs three-dimensional structure from its projections into the set of input images.
  • An example of structure-from-motion algorithm that may be suitable for this purpose is COLMAP.
  • the sparse reconstruction module 122 outputs the sparse reconstruction representation that may include a set of camera poses for each image and a point cloud approximation of the scene depicted in the images.
  • the set of camera poses may include, for each image, intrinsic camera parameters such as focal length, image sensor format, and principal point, and extrinsic camera parameters denoting the position and orientation of the camera (relative to the scene) when the image was captured.
  • the point cloud comprises a sparse set of data points in a three-dimensional space that represent external surfaces of objects detected in the scene.
  • FIG. 3 illustrates an embodiment of a process performed by the dense reconstruction module 124 for generating the dense reconstruction representations using a near envelope prior.
  • the dense reconstruction representation provided by the process comprises a plurality of depth images corresponding to the original input images.
  • the depth images include the color information from the respective original input images and include a depth map for each image that indicates, for each pixel, an estimated distance to a surface of an object in the scene depicted by that pixel given the viewpoint of the image.
  • generating the dense reconstruction representation may be achieved by first computing 310 the near envelope prior and then applying 320 a modified plane-sweep multi-view stereo (MVS) algorithm that includes the near envelope reconstruction prior as an additional cost function.
  • MVS plane-sweep multi-view stereo
  • near-depth hypotheses may be noisy because these points are seen in fewer images and the photo consistency measure is therefore less reliable. This unreliability makes existing MVS algorithms more likely fall victim to common stereo pitfalls, such as repeated structures in the scene, slight scene motion, or materials with view-dependent (shiny) appearance.
  • the near envelope reconstruction prior addresses these limitations as described below.
  • the MVS algorithm treats depth estimation as an energy minimization problem, which optimizes the pixel depths d i by solving the following problem:
  • E smooth is the product of a color- and a depth-difference cost
  • E data ( i ) E photo ( i )+ E sky ( i )+ E sfm ( i ) (3)
  • the photo-consistency term E photo measures the agreement in appearance between a pixel and its projection into multiple other images. This calculation may use the camera poses from the sparse reconstruction module 122 to project pixels into other images in order to determine the agreement in appearance.
  • the sky prior E sky encourages large depths for pixels that are classified by a sky detector as likely be part of sky.
  • the structure from motion prior E sfm encourages the result to stay close to the sparse reconstruction.
  • the dense reconstruction module 124 discretizes the potential depth labels and uses the plane-sweep stereo algorithm to build a cost volume with depth hypotheses for each pixel. While this restricts the algorithm to reconstructing discrete depths without normal, it has the advantage that it can extract a globally optimized solution using a Markov random field (MRF) solver, which can often recover plausible depth for textureless regions using its smoothness term.
  • MRF Markov random field
  • the dense reconstruction module 124 optimizes the MRF at a reduced resolution, e.g., using a FastPD library for performance reasons.
  • the dense reconstruction module 124 then upscales the result to full resolution with a joint bilateral upsampling filter, e.g., using a weighted median filter instead of averaging to prevent introducing erroneous middle values at depths discontinuities.
  • the near envelope is an estimate of a conservative but tight lower bound n i for the pixel depths at each pixel. This boundary is used to discourage nearby erroneous depths by augmenting the data term (E data (i)) in Eq. (3) to include the following additional cost term in the sum:
  • E near penalizes reconstructing depths closer than the near envelope, thus reducing or eliminating erroneously low depth estimates in the depth maps.
  • the dense reconstruction module 124 To compute the near envelope n the dense reconstruction module 124 first identifies 302 anchor pixels with reliable depth estimates.
  • the dense reconstruction module 124 computes the anchors from two sources: (1) stereo matching where it is reliable, and (2) the point features already computed in the point cloud during the sparse reconstruction.
  • Stereo matching is known to work well at strong image gradients. This is used by computing a lightweight MVS result and carefully considering the pixels around image edges for anchors.
  • the dense reconstruction module 124 obtains a noisy depth map by independently computing the minimum of E data for each pixel, e.g., dropping the smoothness term from Eq. 1.
  • the dense reconstruction module 124 uses a geometric consistency filter with aggressive settings to discard a significant portion of the incorrectly estimated depth values.
  • An edge detector is used to compute an edge mask.
  • the dense reconstruction module 124 also adds all the sparse point features that are observed in the image to the set of anchor pixels.
  • the dense reconstruction module 124 propagates 304 the depths of the anchor pixels to the remaining pixels to generate an approximate depth map.
  • the sparse anchor depths are spread to the remaining pixels by solving a first-order Poisson system, similar to the one used in the colorization algorithm:
  • w i ⁇ 0.5 if ⁇ ⁇ x i ⁇ ⁇ is ⁇ ⁇ a ⁇ ⁇ feature ⁇ ⁇ observed ⁇ ⁇ in ⁇ ⁇ 2 ⁇ ⁇ images , ⁇ 2 ⁇ if ⁇ ⁇ x i ⁇ ⁇ is ⁇ ⁇ a ⁇ ⁇ feature ⁇ ⁇ observed ⁇ ⁇ in ⁇ ⁇ 3 ⁇ ⁇ images , ⁇ 10 ⁇ if ⁇ ⁇ x i ⁇ ⁇ is ⁇ ⁇ a ⁇ ⁇ feature ⁇ ⁇ in ⁇ 4 ⁇ ⁇ images , or ⁇ ⁇ an ⁇ ⁇ edge ⁇ ⁇ depth . ( 6 )
  • the dense reconstruction module 124 then filters 306 the approximate depth maps to determine the near envelope. For example, the dense reconstruction module 124 makes the propagated depths more conservative by multiplying them with a constant factor (e.g., 0.6) and subsequently applying a morphological minimum filter with diameter set to about 10% of the image diagonal. They are further cleaned up by smoothing with a wide Gaussian kernel (a set to about 5% of the diagonal).
  • the near envelope prior E near (i) is then computed 308 from the near envelope n based on Eq. (4) above.
  • various image processing techniques may be applied to detect and correct errors in the images that may otherwise cause the densification algorithm to fail.
  • FIG. 4 illustrates an embodiment of a process performed by the two-layer fusion module 126 to generate the two-layer panorama.
  • the two-layer fusion module 126 warps each depth map into a central panoramic image (using equirectangular projection) by triangulating it into a grid-mesh and rendering it with the normal rasterization pipeline, letting the depth test select the front surface when multiple points fall onto the same panorama.
  • the warping may utilize the camera poses estimated by the sparse reconstruction module 122 .
  • the camera poses and depth maps may be used together to place each pixel at a world-space position for creating the grid-mesh.
  • a z-buffering technique may be used in which graphics are rendered incrementally (i.e., one triangle after another) into a color buffer and a z-buffer is used to store the depth of every pixel rendered.
  • a new triangle is rendered each of its pixel's z-values is depth tested against the z-buffer and only rendered if it has a depth smaller than the current value of the z-buffer.
  • this technique only the front-most surfaces appear in the final image.
  • the system Since the system operates to reconstruct not just the first visible surface, but several depth layers, it generates a second back surface warp for each image.
  • One possible way to generate these is depth peeling; however, this method is best suited for systems that can depend on having very accurate depth maps available. More robust results with less reliable depth maps can be achieved, instead, by assuming a depth complexity of two and projecting surfaces using different depth tests prior to stitching.
  • each depth map is first projected 402 into an equirectangular projection using z′ for the depth test to generate a front-warped image for each of the depth images.
  • the two-layer fusion module 126 can stitch the warped images into front and back surface panoramas. For scenes where exposure brackets are captured, this is where all the exposures are fused.
  • the stitching described above may be performed by solving a discrete pixel labeling problem, where each pixel i in the panorama chooses the label ⁇ i from one of the warped sources.
  • the labels may be chosen by minimizing the energy of a cost function that may depend on, for example, a stereo confidence weight, a triangle stretch weight, an exposure penalty, and a depth penalty, a pairwise color match, and a depth disparity.
  • the two-layer fusion module 126 stitches 406 the depth maps of the front-warped depth images to generate a depth-only panorama. To stitch the depth panorama, the two-layer fusion module 126 optimizes this objective:
  • the stereo data term encourages selecting pixels whose depth was estimated with high confidence.
  • the two-layer fusion module 126 uses the maximum-likelihood measure of:
  • the color smoothness term is a truncated version of the seam-hiding pairwise cost from “GraphCut Textures”:
  • E color ⁇ i , j ⁇ ⁇ ⁇ min ⁇ (
  • E disp ⁇ i , j ⁇ ⁇ ⁇ min ⁇ (
  • the two-layer fusion module 126 uses it to constrain the subsequent front and back color stitches.
  • the two-layer fusion module 126 stitches 408 the front-warped images using the color values into a foreground panorama.
  • the two-layer fusion module 126 stitches 410 the back-warped images into a background panorama. In performing these stitches, the two-layer fusion module 126 adds two additional data terms that constrain color exposure and depth selection:
  • E exposure ⁇ 5 ⁇ ⁇ i ⁇ ⁇ i ⁇ i
  • E depth ⁇ 6 ⁇ ⁇ i ⁇ v i ⁇ i .
  • the exposure penalty ⁇ i ⁇ i is 1 if a pixel is over-exposed (except in the darkest bracket) or under-exposed (except in the brightest bracket).
  • the depth penalty via′ can be set differently for the front and back color stitch. For example, for the front color stitch, it is set to 1 only if a pixel's depth is not within a factor of [0.95, 1/0.95] of the depth stitch. For the back color stitch, it is set to 1 only if a pixel's depth is less than a factor of 1/0.95 of the depth stitch.
  • the two-layer fusion module 126 sets the following balancing coefficients:
  • the stitched depth panoramas occasionally still contain small floating elements.
  • the two-layer fusion module 126 “despeckles” them by first identifying strong depth discontinuities between neighboring pixels using a depth ratio test, then finding small disconnected components with fewer than 4096 pixels, and removing them by filling their depths in with a median filter.
  • the depth map is smoothed with a joint bilateral filter whose kernel is cuts across the discontinuities computed above.
  • the two-layer fusion module 126 then fuses 412 the foreground panorama and the background panorama into the two-layer representation.
  • the two-layer fusion module 126 represents the two-layer panorama as a graph.
  • Each pixel i in the panorama has up to two nodes that represent the foreground and background layers. If they exist, they are denoted f i and b i respectively.
  • Each node n has a depth value d(n) and a foreground/background label l(n) ⁇ F, B ⁇ .
  • the two-layer fusion module 126 generates for both front and back panoramas, fully 4-connected but disjoint grid-graphs independently.
  • Each node is assigned a depth and label according to the panorama from which it is drawn. These graphs contain redundant coverage of some scene objects, but this is fully intentional and will be useful to remove color fringes around depth continuities.
  • the two-layer fusion module 126 For each pair of neighboring pixels, the two-layer fusion module 126 considers all f and b nodes if they exist. The two-layer fusion module 126 also consider all pairs of nodes from i and j and sorts them by their depth ratio, most similar first. Then, it connects the most similar pair if the depth ratio is above ⁇ dratio . If there is another pair that can be connected, it is connected unless such a connection would cross an edge.
  • the two-layer fusion module 126 has generated a well-connected two-layer graph.
  • the back layer contains some extra content, but large translations can still reveal holes.
  • the back layer is next expanded to further hallucinating depth and color.
  • the two-layer fusion module 126 expands the back layer in an iterative fashion, one pixel ring at a time. In each iteration, the two-layer fusion module 126 identifies b and f nodes that are not connected in one direction. It tentatively creates a new candidate neighbor node for these pixels and sets their depth and colors to the average of the nodes that spawned them.
  • Candidate nodes are kept only if they are not colliding with already existing nodes (using ⁇ dratio ), and if they become connected to the nodes that spawned them.
  • An optional step in the three-dimensional reconstruction pipeline is to compute a normal map.
  • the goal is merely to compute a normal that looks plausible and good enough for simple artist-driven lighting effects, rather than an accurate normal map of the scene.
  • a process for generating the normal map is illustrated in FIG. 5 .
  • the normal map estimation module 128 generates 502 a first (base) normal map from the depth values in the two-layer panorama that is accurate with respect to surface slopes but not textures on each surface.
  • the normal map estimation module 128 also generates 504 a second (details) map from the luminance values that has surface details that are artistically-driven to produce desired effects in response to lighting.
  • the normal map estimation module 128 then transforms 506 the second normal map (with the texture details) onto the first normal map (with accurate orientation) to get a combined normal map that serves are a good approximation for the scene.
  • the base normal map is piece-wise smooth but discontinuous at depth edges, and contain the correct surface slopes.
  • This normal map is computed by filtering the depth normals with a guided filter, guided by the depth map.
  • a wide window size may be used that corresponds to about a solid angle of about 17.5 degrees.
  • the normal map estimation module 128 hallucinates from the luminance image, estimating a normal map by hallucinating a depth map just from the image data, assuming surface depth is inversely related to image intensity. While the depth generated in this manner is highly approximate, it is fully consistent with the image data and provides a surprisingly effective means for recovering geometric detail variations.
  • n f is the normal obtained by applying the guided filter to the depth normals
  • n i is the normal obtained from the image-based estimation method.
  • R s is a local coordinate frame for the image-based normal. It is obtained by setting the first row to a vector pointing radially outward
  • R s , 1 R s , 0 ⁇ w up
  • , R s , 2 R s , 0 ⁇ R s , 1 ( 16 )
  • the resulting normal map can then be provided with the two-layer mesh to a 3D rendering system 130 for viewing by another user.
  • the above described processes generate a three-dimensional reconstructed image from a set of images of a scene.
  • the above-described processes may similarly be applied to a set of videos of a the scene by generating a three-dimensional image from corresponding frames in the set of videos (e.g., at each time instance) and combining the three-dimensional images in a sequence of three-dimensional video frames.
  • a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
  • Embodiments of the invention may also relate to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a tangible computer readable storage medium or any type of media suitable for storing electronic instructions, and coupled to a computer system bus.
  • any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • Embodiments of the invention may also relate to a computer data signal embodied in a carrier wave, where the computer data signal includes any embodiment of a computer program product or other data combination described herein.
  • the computer data signal is a product that is presented in a tangible medium or carrier wave and modulated or otherwise encoded in the carrier wave, which is tangible, and transmitted according to any suitable transmission method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

To enable better sharing and preservation of immersive experiences, a graphics system reconstructs a three-dimensional scene from a set of images of the scene taken from different vantage points. The system processes each image to extract depth information therefrom and then stitches the images (both color and depth information) into a multi-layered panorama that includes at least front and back surface layers. The front and back surface layers are then merged to remove redundancies and create connections between neighboring pixels that are likely to represent the same object, while removing connections between neighboring pixels that are not. The resulting layered panorama with depth information can be rendered using a virtual reality (VR) system, a mobile device, or other computing and display platforms using standard rendering techniques, to enable three-dimensional viewing of the scene.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of U.S. Provisional Patent Application No. 62/447,128 filed on Jan. 17, 2017, the content of which is incorporated by reference herein.
  • BACKGROUND
  • This disclosure relates generally to rendering three-dimensional images and more specifically to reconstructing a three-dimensional scene from a set of two-dimensional images.
  • People take pictures of scenes that they experience so they can share their experiences with others or to re-experience the scene at a later time. Unfortunately, technical limitations limit how well the subsequent experience is relived. Two-dimensional images do not provide the full three-dimensional experience of being there, and people usually do not carry around expensive and bulky three-dimensional cameras. Accordingly, it would be useful to enable people to capture a scene and digitally preserve it in a way that allows the person or friends to virtually immerse themselves in the scene and re-experience the sensation of being there at a later time. Preferably, this would be as easy as taking a picture today, using a standard phone or camera.
  • SUMMARY
  • A method, non-transitory computer-readable storage medium, and image reconstruction system generates a three-dimensional image from a plurality of two-dimensional input images. A plurality of input images of a scene is received in which the input images are taken from different vantage points. The input images may have varying amounts of overlap with each other and varying camera orientations. The plurality of input images is processed to generate a sparse reconstruction representation of the scene. The sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene. Based in part on the sparse reconstruction representation, the plurality of input images is processed to generate respective dense reconstruction representations of each of the plurality of input images. Here, each of the respective dense reconstruction representations include a respective depth image for a corresponding input image in which the depth image includes both color and depth information. Front surfaces of the depth images are projected using a forward depth test to generate a plurality of front-warped images. Back surfaces of the depth images are projected using an inverted depth test to generate a plurality of back-warped images. The front-warped images and the back-warped images are stitched to generate a two-layer panorama having a front surface panorama and a back surface panorama. The front surface panorama and the back surface panorama are then fused to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
  • Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, wherein any feature mentioned in one claim category, e.g. method, can be claimed in another claim category, e.g. system, as well. The dependencies or references back in the attached claims are chosen for formal reasons only. However any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination of claims and the features thereof is disclosed and can be claimed regardless of the dependencies chosen in the attached claims. The subject-matter which can be claimed comprises not only the combinations of features as set out in the attached claims but also any other combination of features in the claims, wherein each feature mentioned in the claims can be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein can be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any of the features of the attached claims.
  • In an embodiment according to the invention, a method comprises:
  • receiving a plurality of input images of a scene taken from different vantage points; processing the plurality of input images to generate a sparse reconstruction representation of the scene, the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene;
    based in part on the sparse reconstruction representation, processing the plurality of input images to generate respective dense reconstruction representations of each of the plurality of input images, the respective dense reconstruction representations each including respective depth images for the plurality of input images, the depth images including color and depth information; projecting front surfaces of the depth images using a forward depth test to generate a plurality of front-warped images;
    projecting back surfaces of the depth images using an inverted depth test to generate a plurality of back-warped images;
    stitching the front-warped images and the back-warped images to generate a two-layer panorama having a front surface panorama and a back surface panorama; and
    fusing the front surface panorama and the back surface panorama in the two-layer panorama to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
  • Processing the plurality of input images to generate the sparse reconstruction representation may comprise:
  • applying a surface-from-motion algorithm to the plurality of input images.
  • Processing the plurality of input images to generate the respective dense reconstruction representations may comprise:
  • generating a near envelope prior that assigns a cost to estimated depth values in front of a near envelope; and
    applying a multi-view stereo processing algorithm to estimate the depth values based on a cost function including the near envelope prior.
  • Generating the near envelope prior may comprise:
  • identifying anchor pixels in the plurality of input images that have high confidence depth estimates;
    propagating the depth estimates of the anchor pixels to other pixels in the plurality of input images to generate approximate depth maps;
    filtering the approximate depth maps to determine a near envelope; and generating the near envelope prior based on the depth estimates and the near envelope.
  • Stitching the front-warped images and the back-warped images to generate a two-layer panorama may comprise:
  • stitching a depth panorama using depth values from the front-warped images;
    stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama;
    stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and
    combining the front surface panorama and the back surface panorama into the two-layer panorama.
  • Fusing the front surface panorama and the back surface panorama may comprise:
  • removing background pixels from the back surface panorama that match corresponding foreground pixels in the front surface panorama;
    storing connections between neighboring pixels meeting a threshold similarity in depth and color information; and
    hallucinating color and depth information in missing pixel locations.
  • In an embodiment according to the invention, a method may comprise:
  • generating a normal map for the multi-layered geometric mesh, the normal map estimating for each pixel, an angle normal to a surface depicted by the pixel.
  • Generating the normal map may comprise:
  • generating a base normal map from depth values in the three-dimensional image;
    generating a detailed normal from luminance values in the three-dimensional image; and
    transforming the detailed normal map onto the base normal map to generate a combined normal map.
  • The plurality of input images may have varying levels of overlap and orientation changes.
  • In an embodiment according to the invention, a non-transitory computer-readable storage medium may store instructions, the instructions when executed by a processor may cause the processor to perform steps including:
  • receiving a plurality of input images of a scene taken from different vantage points; processing the plurality of input images to generate a sparse reconstruction representation of the scene, the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene;
    based in part on the sparse reconstruction representation, processing the plurality of input images to generate respective dense reconstruction representations of each of the plurality of input images, the respective dense reconstruction representations each including respective depth images for the plurality of input images, the depth images including color and depth information;
    projecting front surfaces of the depth images using a forward depth test to generate a plurality of front-warped images;
    projecting back surfaces of the depth images using an inverted depth test to generate a plurality of back-warped images;
    stitching the front-warped images and the back-warped images to generate a two-layer panorama having a front surface panorama and a back surface panorama;
    fusing the front surface panorama and the back surface panorama in the two-layer panorama to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
  • Processing the plurality of input images to generate the sparse reconstruction representation may comprise:
  • applying a surface-from-motion algorithm to the plurality of input images.
  • Processing the plurality of input images to generate the respective dense reconstruction representations may comprise:
  • generating a near envelope prior that assigns a cost to estimated depth values in front of a near envelope; and
    applying a multi-view stereo processing algorithm to estimate the depth values based on a cost function including the near envelope prior.
  • Generating the near envelope prior may comprise:
  • identifying anchor pixels in the plurality of input images that have high confidence depth estimates;
    propagating the depth estimates of the anchor pixels to other pixels in the plurality of input images to generate approximate depth maps;
    filtering the approximate depth maps to determine a near envelope; and
    generating the near envelope prior based on the depth estimates and the near envelope.
  • Stitching the front-warped images and the back-warped images to generate a two-layer panorama may comprise:
  • stitching a depth panorama using depth values from the front-warped images;
    stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama;
    stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and
    combining the front surface panorama and the back surface panorama into the two-layer panorama.
  • Fusing the front surface panorama and the back surface panorama may comprise:
  • removing background pixels from the back surface panorama that match corresponding foreground pixels in the front surface panorama;
    storing connections between neighboring pixels meeting a threshold similarity in depth and color information; and
    hallucinating color and depth information in missing pixel locations.
  • The instructions when executed by processor further may cause the processor to perform steps including:
  • generating a normal map for the multi-layered geometric mesh, the normal map estimating for each pixel, an angle normal to a surface depicted by the pixel.
  • Generating the normal map may comprise:
  • generating a base normal map from depth values in the three-dimensional image;
    generating a detailed normal from luminance values in the three-dimensional image; and
    transforming the detailed normal map onto the base normal map to generate a combined normal map.
  • The plurality of input images may have varying levels of overlap and orientation changes.
  • In an embodiment according to the invention, a system may comprise:
  • a processor; and
    a non-transitory computer-readable storage medium storing instruction for generating a three-dimensional image, the instructions when executed by a processor causing the processor to perform steps including:
    receiving a plurality of input images of a scene taken from different vantage points; processing the plurality of input images to generate a sparse reconstruction representation of the scene, the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene;
    based in part on the sparse reconstruction representation, processing the plurality of input images to generate respective dense reconstruction representations of each of the plurality of input images, the respective dense reconstruction representations each including respective depth images for the plurality of input images, the depth images including color and depth information; projecting front surfaces of the depth images using a forward depth test to generate a plurality of front-warped images;
    projecting back surfaces of the depth images using an inverted depth test to generate a plurality of back-warped images;
    stitching the front-warped images and the back-warped images to generate a two-layer panorama having a front surface panorama and a back surface panorama;
    fusing the front surface panorama and the back surface panorama in the two-layer panorama to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
  • Stitching the front-warped images and the back-warped images to generate a two-layer panorama may comprise:
  • stitching a depth panorama using depth values from the front-warped images;
    stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama;
    stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and
    combining the front surface panorama and the back surface panorama into the two-layer panorama.
  • In an embodiment according to the invention, one or more computer-readable non-transitory storage media may embody software that is operable when executed to perform a method according to the invention or any of the above mentioned embodiments.
  • In an embodiment according to the invention, a system may comprise: one or more processors; and at least one memory coupled to the processors and comprising instructions executable by the processors, the processors operable when executing the instructions to perform a method according to the invention or any of the above mentioned embodiments.
  • In an embodiment according to the invention, a computer program product, preferably comprising a computer-readable non-transitory storage media, may be operable when executed on a data processing system to perform a method according to the invention or any of the above mentioned embodiments.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating an example embodiment of a system for generating a three-dimensional image from a plurality of two-dimensional input images.
  • FIG. 2 is a flowchart illustrating an embodiment of a process for generating a sparse reconstruction representation of a scene.
  • FIG. 3 is a flowchart illustrating an embodiment of a process for generating a dense reconstruction representation of a scene.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for fusing a plurality of depth images into a multi-layer panoramic image.
  • FIG. 5 is a flowchart illustrating an embodiment of a process for generating a normal map for a multi-layer panoramic image.
  • The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.
  • DETAILED DESCRIPTION Overview
  • To enable better sharing and preservation of immersive experiences, a graphics system reconstructs a three-dimensional scene from a set of images of the scene taken from different vantage points. The system processes each image to extract depth information therefrom and then stitches the images (both color and depth information) into a multi-layered panorama that includes at least front and back surface layers. The front and back surface layers are then merged to remove redundancies and create connections between neighboring pixels that are likely to represent the same object, while removing connections between neighboring pixels that are not. The resulting layered panorama with depth information can be rendered using a virtual reality (VR) system, a mobile device, or other computing and display platforms using standard rendering techniques, to enable three-dimensional viewing of the scene.
  • System Architecture
  • FIG. 1 illustrates a system for reconstructing a three-dimensional scene from a set of images, in accordance with one embodiment. As depicted, an image capture system 110 (e.g., a camera) is used to take a set of images 115 from different viewing positions in a scene and outputs the images 115 to a three-dimensional (3D) photo reconstruction system 120. The three-dimensional photo reconstruction system 120 processes the images 115 to generate a three-dimensional renderable panoramic image 125. The three-dimensional renderable panoramic image 125 is outputted to a three-dimensional renderer 130 that renders a three-dimensional image for display. In alternative embodiments, additional intermediate components may be included that are not expressly shown in FIG. 1. For example, an image storage (e.g., a database) may be included between the image capture system 110 and the three-dimensional photo reconstruction system 120 to store images 115 until selected for processing. Similarly, an image storage may exist between the three-dimensional photo reconstruction system 120 and the three-dimensional renderer 130 to store the renderable panoramic image 125. Additionally, conventional networking components may facilitate communications between the various systems 110, 120, 130, any intermediate components (e.g., storage systems), or different individual modules of the 3D photo reconstruction system 120 described below. Each of the illustrated systems 110, 120, 130 may include one or more processors and a computer-readable storage medium that stores instructions that when executed cause the respective systems to carry out the processes and functions attributed to the systems 110, 120, 130 described herein. In different embodiments, systems 110, 120, 130 or portions thereof may operate on a client computer device, one a cloud server, on an enterprise server, or a combination thereof. For example, in one embodiment, the 3D photo reconstruction system 120 operates on one or more cloud servers associated with a social networking system or other online system. In this example, a user having an account with the social networking system may upload a set of photos to the social networking system as the input images 115. The 3D photo reconstruction system 120 may operate on a cloud server to generate the 3D image and provide it to the social networking system for rendering to viewers (e.g., as a profile image or post on the user's profile page).
  • The image capture system 110 may comprise any system capable of taking a set of images of a scene, such as a standalone consumer camera or a camera built into a phone or other mobile device. Such cameras may include, for example, a digital single-lens reflex (DSLR) camera, a 360 degree camera, or other wide field of view camera. The image capture system 110 may also include a camera that is wearable on a human, such as outward facing cameras in a virtual reality (VR) or augmented reality (AR) headset. The image capture system 110 may capture individual still images or video of the scene.
  • The image capture system 110 may capture the set of images in an unstructured manner. For example, the particular positions from which each image is captured do not need to be at precise known positions and the images may have varying amounts of overlap or differences in orientation. Furthermore, the number of images captured may be arbitrary within reasonable upper and lower bounds. In one example, the set of images may be taken by moving the image capture system 110 sideways (e.g., at half arm's length) while taking a series of still images. In another embodiment, the images may be taken by multiple different cameras positioned at different locations. Generally, to form a cohesive representation of the captured scene, at least some of the images may overlap with one or more other images. Thus, the images captured by the image capture system 110 may be captured quickly in a simple user-friendly manner without requiring special equipment or specialized knowledge about how to position the camera.
  • The image capture system 110 may also include software coupled to the camera that assists the user in capturing images, such as by guiding the user to take a sufficient number of images to capture the scene. The software may be embodied, for example, as instructions on a non-transitory computer-readable storage medium of a camera, client computer device, or on a cloud server communicatively coupled to the camera. To this end, the software may use motion or position sensors in the image capture system to record approximate relative positions of the capture for each image and possibly to help guide the user when capturing the images. For example, in one embodiment, simultaneous localization and mapping (SLAM) technology may be applied to determine the user's estimated location as the user moves through the scene or moves the image capture system 110 around. The software can then track the views that are obtained and may provide the user guidance as to what additional views may be desirable for the three-dimensional reconstruction. In one embodiment, a fast panoramic image preview (e.g., a 360 degree image) may be generated and made accessible to the user on the image capture system 110 or other connected device (e.g., any device suitable for displaying image content) to enable the user to see a preview of the views captured at any point during image capture. This may be useful to help guide the user to capture additional views that may be missing from the panoramic image. In another embodiment, the image capture system 110 may perform content or face recognition on the captured images to identify faces or other objects of interest. The image capture system 110 may then provide real-time feedback to the user during capture to alert the user if a face or object of interest is detected but insufficient views of that face or objected have been captured. Thus, the image capture system 110 may guide the user to ensure that an adequate number of images of faces or objects of interest are obtained so that they may be effectively rendered in the three-dimensional reconstruction.
  • The image capture system 110 may furthermore include software for selecting a set of images that are suitable for processing by the three-dimensional reconstruction system 120 from a larger collection of images. For example, in one embodiment, images that are intended for reconstructing into a three-dimensional renderable image may be tagged with metadata to indicate that they were captured as part of an image capture sequence specifically for this purpose. Alternatively, the image capture system 110 may selectively determine images that are suitable for reconstruction based on a number of criteria, such as being captured within a time threshold or within physical proximity to a certain location. The images may be selected from discrete images or from frames of video captured by the image capture system 110.
  • The image capture system 110 may include multiple cameras that may be operated by separate users to capture images of a scene. The images may be stored to a common database and may be processed to select suitable images in the same way that images from a single camera may be processed. For example, in one scenario, multiple users may upload images so that several users can share images captured by others and reconstruct the scene using the shared images. In a social networking environment, social connections may be used to determine which images are available to others. For example, a user may set a privacy setting to indicate which other users can access and use their images, which may be set based on social connections between the users.
  • In an embodiment, the image capture system 110 may capture video suitable for three-dimensional reconstruction in which at each time instance, a frame of video is captured from multiple vantage points in the same manner described above. Reference to “images” herein, may therefore also correspond to video frames.
  • Before the set of images is used by the three-dimensional photo reconstruction system, one or more pre-processing tasks may be performed on the images. For example, the images may be transcoded into a predefined format and file quality.
  • The set of images is then provided to the three-dimensional (3D) photo reconstruction system 120, which converts the set of images into a three-dimensional renderable panoramic image with multiple layers and depth information. The three-dimensional photo reconstruction system 120 includes several functional modules for performing this conversion task, including a sparse reconstruction module 122, a dense reconstruction module 124, a two-layer fusion module 126, and a normal map estimation module 128. The sparse reconstruction module 122 processes the input images 115 to generate a sparse reconstruction representation of the scene. The dense reconstruction module 124 then uses the sparse reconstruction representation to generate depth maps for each of the input images 115. The dense reconstruction module 124 may apply a modified multi-view stereo algorithm that results in improved depth estimations relative to conventional algorithms executing in this context. Once the depth information is densified by the dense reconstruction module 124, the depth images including the original set of images and their corresponding depth maps are provided to the two-layer fusion module 126. The two-layer fusion module 126 merges the depth images into a multi-layer panorama comprising, for example, a two-layer panoramic mesh that is renderable by the three-dimensional renderer 130. The normal map estimation module 128 estimates normals for surfaces of the two-layer panoramic mesh that can be used in by the renderer 130 to add various effects such as lighting or flooding effects to the rendered image. Processes performed by each of these modules are described in further detail below.
  • Once the three-dimensional renderable panoramic image is constructed, it may be rendered for viewing by the three-dimensional renderer 130. The three-dimensional renderer 130 may comprise, for example, a virtual reality (VR) system, a mobile device, or other computing and display platforms that can display rendered three-dimensional content using standard rendering techniques.
  • The above-described modules can operate on a set of images corresponding to an individual time instance to reconstruct a three-dimensional image, or can operate on a sequence of image sets, with each of the image sets have a set of video frames from different videos captured from different vantage points. The sparse reconstruction module 122, dense reconstruction module 124, two-layer fusion module 126, normal map estimation module 128, and renderer 130 may then operate to generate a three-dimensional frame from each image set. The sequence of three-dimensional frames may be stored in sequence as a three-dimensional video of the scene.
  • Three-Dimensional Photo Reconstruction Sparse Reconstruction
  • FIG. 2 illustrates an embodiment of a process performed by the sparse reconstruction module 122 for generating a sparse reconstruction representation from a plurality of input images. A plurality of input images are received 202. The input images comprise two-dimensional images that may be pre-selected as images that are suitable for reconstruction into the three-dimensional image. Generally, the input images include a set of images of a scene captured from different vantage points, with at least some of the input images overlapping other images of the scene.
  • The sparse reconstruction module applies 204 a structure-from-motion algorithm to the input images that reconstructs three-dimensional structure from its projections into the set of input images. An example of structure-from-motion algorithm that may be suitable for this purpose is COLMAP. From the result of the structure-from-motion algorithm, the sparse reconstruction module 122 outputs the sparse reconstruction representation that may include a set of camera poses for each image and a point cloud approximation of the scene depicted in the images. Here, the set of camera poses may include, for each image, intrinsic camera parameters such as focal length, image sensor format, and principal point, and extrinsic camera parameters denoting the position and orientation of the camera (relative to the scene) when the image was captured. The point cloud comprises a sparse set of data points in a three-dimensional space that represent external surfaces of objects detected in the scene.
  • Dense Reconstruction
  • FIG. 3 illustrates an embodiment of a process performed by the dense reconstruction module 124 for generating the dense reconstruction representations using a near envelope prior. The dense reconstruction representation provided by the process comprises a plurality of depth images corresponding to the original input images. The depth images include the color information from the respective original input images and include a depth map for each image that indicates, for each pixel, an estimated distance to a surface of an object in the scene depicted by that pixel given the viewpoint of the image.
  • In an embodiment, generating the dense reconstruction representation may be achieved by first computing 310 the near envelope prior and then applying 320 a modified plane-sweep multi-view stereo (MVS) algorithm that includes the near envelope reconstruction prior as an additional cost function. In conventional MVS algorithms, near-depth hypotheses may be noisy because these points are seen in fewer images and the photo consistency measure is therefore less reliable. This unreliability makes existing MVS algorithms more likely fall victim to common stereo pitfalls, such as repeated structures in the scene, slight scene motion, or materials with view-dependent (shiny) appearance. The near envelope reconstruction prior addresses these limitations as described below.
  • The MVS algorithm treats depth estimation as an energy minimization problem, which optimizes the pixel depths di by solving the following problem:
  • argmin d i E dana ( i ) + λ smooth ( i , j ) E smooth ( i , j ) , ( 1 )
  • where i is a pixel and ci is its color. The smoothness term Esmooth is the product of a color- and a depth-difference cost,

  • E smooth(i,j)=w color(c i ,c j)w depth(d i ,d j)  (2)
  • where wcolor is a color difference cost between color ci and cj and wdepth is a depth difference cost between depths di and dj. The smoothness term Esmooth encourages the depth map to be smooth wherever the image lacks texture. The data term combines three costs:

  • E data(i)=E photo(i)+E sky(i)+E sfm(i)  (3)
  • The photo-consistency term Ephoto measures the agreement in appearance between a pixel and its projection into multiple other images. This calculation may use the camera poses from the sparse reconstruction module 122 to project pixels into other images in order to determine the agreement in appearance. The sky prior Esky encourages large depths for pixels that are classified by a sky detector as likely be part of sky. The structure from motion prior Esfm encourages the result to stay close to the sparse reconstruction.
  • The dense reconstruction module 124 discretizes the potential depth labels and uses the plane-sweep stereo algorithm to build a cost volume with depth hypotheses for each pixel. While this restricts the algorithm to reconstructing discrete depths without normal, it has the advantage that it can extract a globally optimized solution using a Markov random field (MRF) solver, which can often recover plausible depth for textureless regions using its smoothness term. The dense reconstruction module 124 optimizes the MRF at a reduced resolution, e.g., using a FastPD library for performance reasons. The dense reconstruction module 124 then upscales the result to full resolution with a joint bilateral upsampling filter, e.g., using a weighted median filter instead of averaging to prevent introducing erroneous middle values at depths discontinuities.
  • The near envelope is an estimate of a conservative but tight lower bound ni for the pixel depths at each pixel. This boundary is used to discourage nearby erroneous depths by augmenting the data term (Edata(i)) in Eq. (3) to include the following additional cost term in the sum:
  • E near ( i ) = { λ near if d i < n i 0 otherwise , ( 4 )
  • where λnear is a predefined parameter (e.g., (λnear=1). The additional cost term Enear penalizes reconstructing depths closer than the near envelope, thus reducing or eliminating erroneously low depth estimates in the depth maps.
  • To compute the near envelope n the dense reconstruction module 124 first identifies 302 anchor pixels with reliable depth estimates. The dense reconstruction module 124 computes the anchors from two sources: (1) stereo matching where it is reliable, and (2) the point features already computed in the point cloud during the sparse reconstruction. Stereo matching is known to work well at strong image gradients. This is used by computing a lightweight MVS result and carefully considering the pixels around image edges for anchors. The dense reconstruction module 124 obtains a noisy depth map by independently computing the minimum of Edata for each pixel, e.g., dropping the smoothness term from Eq. 1. The dense reconstruction module 124 uses a geometric consistency filter with aggressive settings to discard a significant portion of the incorrectly estimated depth values. An edge detector is used to compute an edge mask. If the image edge coincides with a depth edge, it is desirable to ensure that the depth belonging to the front layer is selected. This may be achieved by dilating the edge mask by 1 pixel and applying a 5×5 morphological minimum filter to the masked pixels inside the detected edges. In addition to these anchors computed from stereo, the dense reconstruction module 124 also adds all the sparse point features that are observed in the image to the set of anchor pixels.
  • The dense reconstruction module 124 propagates 304 the depths of the anchor pixels to the remaining pixels to generate an approximate depth map. The sparse anchor depths are spread to the remaining pixels by solving a first-order Poisson system, similar to the one used in the colorization algorithm:
  • argmin x i w i ( x i - x i ) 2 + ( i , j ) w ij ( x i - x j ) 2 , ( 5 )
  • where x′i are the depths of the anchor pixels (where defined), and xi are the densely propagated depths solved for, wij=e−(c i −c j ) 2 /2σ env 2 is the color-based affinity term, and wi, is a unary weight term that represents confidence in each anchor:
  • w i = { 0.5 if x i is a feature observed in 2 images , 2 if x i is a feature observed in 3 images , 10 if x i is a feature in 4 images , or an edge depth . ( 6 )
  • The dense reconstruction module 124 then filters 306 the approximate depth maps to determine the near envelope. For example, the dense reconstruction module 124 makes the propagated depths more conservative by multiplying them with a constant factor (e.g., 0.6) and subsequently applying a morphological minimum filter with diameter set to about 10% of the image diagonal. They are further cleaned up by smoothing with a wide Gaussian kernel (a set to about 5% of the diagonal). The near envelope prior Enear(i) is then computed 308 from the near envelope n based on Eq. (4) above.
  • In one embodiment, various image processing techniques may be applied to detect and correct errors in the images that may otherwise cause the densification algorithm to fail.
  • Two-Layer Fusion
  • FIG. 4 illustrates an embodiment of a process performed by the two-layer fusion module 126 to generate the two-layer panorama. Here, the two-layer fusion module 126 warps each depth map into a central panoramic image (using equirectangular projection) by triangulating it into a grid-mesh and rendering it with the normal rasterization pipeline, letting the depth test select the front surface when multiple points fall onto the same panorama. The warping may utilize the camera poses estimated by the sparse reconstruction module 122. For example, the camera poses and depth maps may be used together to place each pixel at a world-space position for creating the grid-mesh. In this process, a z-buffering technique may be used in which graphics are rendered incrementally (i.e., one triangle after another) into a color buffer and a z-buffer is used to store the depth of every pixel rendered. When a new triangle is rendered each of its pixel's z-values is depth tested against the z-buffer and only rendered if it has a depth smaller than the current value of the z-buffer. In this technique, only the front-most surfaces appear in the final image.
  • One problem with a simple approach to this task is that long stretched triangles at depth discontinuities connecting foreground and background pixels might obscure other good content, and it is undesirable to include them in the stitch in any case. The problem can be resolved by blending the z-values of the rasterized fragments with a stretch penalty s∈[0, 1] before the depth test, z′=(z+s)/2. The division by 2 keeps the value z′ in normalized clipping space. The stretch penalty,
  • s = 1 - min ( α τ stretch , 1 ) ( 7 )
  • considers the grazing angle α from the original viewpoint and penalizes small values below τstretch (e.g., τstretch=1.66°), i.e. rays that are nearly parallel to the triangle surface. This modification pushes highly stretched triangles back, so potentially less stretched back surfaces can win over instead.
  • Since the system operates to reconstruct not just the first visible surface, but several depth layers, it generates a second back surface warp for each image. One possible way to generate these is depth peeling; however, this method is best suited for systems that can depend on having very accurate depth maps available. More robust results with less reliable depth maps can be achieved, instead, by assuming a depth complexity of two and projecting surfaces using different depth tests prior to stitching.
  • In the process of FIG. 4, the front-most surfaces of each depth map are first projected 402 into an equirectangular projection using z′ for the depth test to generate a front-warped image for each of the depth images. The back-most surfaces of each depth map are projected 404 into an equirectangular projection using z″=1−z′ for the depth test, effectively inverting it so that the back-most surfaces appear in the back-warped image instead of the front-most surfaces.
  • Once the depth maps are warped into a common perspective (or camera pose), the two-layer fusion module 126 can stitch the warped images into front and back surface panoramas. For scenes where exposure brackets are captured, this is where all the exposures are fused. The colors are linearized using a naive gamma=2.2 assumption, and multiplied with the appropriate exposure factor.
  • The stitching described above may be performed by solving a discrete pixel labeling problem, where each pixel i in the panorama chooses the label αi from one of the warped sources. The labels may be chosen by minimizing the energy of a cost function that may depend on, for example, a stereo confidence weight, a triangle stretch weight, an exposure penalty, and a depth penalty, a pairwise color match, and a depth disparity.
  • For example, in one embodiment, the two-layer fusion module 126 stitches 406 the depth maps of the front-warped depth images to generate a depth-only panorama. To stitch the depth panorama, the two-layer fusion module 126 optimizes this objective:
  • argmin [ α i } λ 1 E stereo + λ 2 E stretch Data terms + λ 3 E color + λ 4 E disp Smoothness terms , ( 8 )
  • The stereo data term encourages selecting pixels whose depth was estimated with high confidence. The two-layer fusion module 126 uses the maximum-likelihood measure of:
  • E stereo = i - log MLM i α i , ( 9 )
  • but only for pixels that pass the geometric consistency test. For inconsistent pixels the two-layer fusion module 126 sets MLMi αi=0.001. The triangle stretch term discourages selecting pixels from long “rubber sheet” triangles:
  • E stretch = i - log s i α i . ( 10 )
  • The color smoothness term is a truncated version of the seam-hiding pairwise cost from “GraphCut Textures”:
  • E color = i , j min ( || c i α i - c i α j || 2 2 , τ c ) + min ( || c j α i - c j α j || 2 2 , τ c ) , ( 11 )
  • and the disparity smoothness term is a similar truncated term:
  • E disp = i , j min ( | 1 d j α i - 1 d i α j | , τ d ) + min ( | 1 d j α i - 1 d j α j | , τ d ) . ( 12 )
  • After having obtained a depth stitch by solving Eq. 8, the two-layer fusion module 126 uses it to constrain the subsequent front and back color stitches. Here, the two-layer fusion module 126 stitches 408 the front-warped images using the color values into a foreground panorama. The two-layer fusion module 126 stitches 410 the back-warped images into a background panorama. In performing these stitches, the two-layer fusion module 126 adds two additional data terms that constrain color exposure and depth selection:
  • E exposure = λ 5 i μ i α i , E depth = λ 6 i v i α i . ( 13 )
  • All penalties μi αi, νi αi are 0, except in the following conditions. The exposure penalty μi αi is 1 if a pixel is over-exposed (except in the darkest bracket) or under-exposed (except in the brightest bracket). The depth penalty via′ can be set differently for the front and back color stitch. For example, for the front color stitch, it is set to 1 only if a pixel's depth is not within a factor of [0.95, 1/0.95] of the depth stitch. For the back color stitch, it is set to 1 only if a pixel's depth is less than a factor of 1/0.95 of the depth stitch.
  • In an embodiment, the two-layer fusion module 126 sets the following balancing coefficients:

  • λ1=5,λ2=50,λ345=100,λ6=75,  (14)
  • and solves the labeling problems at a reduced 512×256 resolution using an alpha expansion algorithm, and upsamples the resulting label map to full resolution (8192×4096 pixels) using a simple PatchMatch-based upsampling algorithm.
  • The stitched depth panoramas occasionally still contain small floating elements. The two-layer fusion module 126 “despeckles” them by first identifying strong depth discontinuities between neighboring pixels using a depth ratio test, then finding small disconnected components with fewer than 4096 pixels, and removing them by filling their depths in with a median filter. In addition, the depth map is smoothed with a joint bilateral filter whose kernel is cuts across the discontinuities computed above.
  • Once the front and back panoramas are generated, the two-layer fusion module 126 then fuses 412 the foreground panorama and the background panorama into the two-layer representation. To do this, the two-layer fusion module 126 represents the two-layer panorama as a graph. Each pixel i in the panorama has up to two nodes that represent the foreground and background layers. If they exist, they are denoted fi and bi respectively. Each node n has a depth value d(n) and a foreground/background label l(n)∈{F, B}. The two-layer fusion module 126 generates for both front and back panoramas, fully 4-connected but disjoint grid-graphs independently. Each node is assigned a depth and label according to the panorama from which it is drawn. These graphs contain redundant coverage of some scene objects, but this is fully intentional and will be useful to remove color fringes around depth continuities. The two-layer fusion module 126 removes the redundancies by removing all the b nodes that are too similar to their fi counterparts, i.e., d(fiR)/d(bi)<τdratio=0.75.
  • The result is not redundant, but now the b graph contains many isolated components, and the f graph contains long connections across discontinuities. The connectivity is then recomputed. For each pair of neighboring pixels, the two-layer fusion module 126 considers all f and b nodes if they exist. The two-layer fusion module 126 also consider all pairs of nodes from i and j and sorts them by their depth ratio, most similar first. Then, it connects the most similar pair if the depth ratio is above τdratio. If there is another pair that can be connected, it is connected unless such a connection would cross an edge.
  • To this point, the two-layer fusion module 126 has generated a well-connected two-layer graph. The back layer contains some extra content, but large translations can still reveal holes. The back layer is next expanded to further hallucinating depth and color. The two-layer fusion module 126 expands the back layer in an iterative fashion, one pixel ring at a time. In each iteration, the two-layer fusion module 126 identifies b and f nodes that are not connected in one direction. It tentatively creates a new candidate neighbor node for these pixels and sets their depth and colors to the average of the nodes that spawned them. Candidate nodes are kept only if they are not colliding with already existing nodes (using τdratio), and if they become connected to the nodes that spawned them.
  • Normal Map Estimation
  • An optional step in the three-dimensional reconstruction pipeline is to compute a normal map. In one embodiment, the goal is merely to compute a normal that looks plausible and good enough for simple artist-driven lighting effects, rather than an accurate normal map of the scene. A process for generating the normal map is illustrated in FIG. 5. In this embodiment, the normal map estimation module 128 generates 502 a first (base) normal map from the depth values in the two-layer panorama that is accurate with respect to surface slopes but not textures on each surface. The normal map estimation module 128 also generates 504 a second (details) map from the luminance values that has surface details that are artistically-driven to produce desired effects in response to lighting. The normal map estimation module 128 then transforms 506 the second normal map (with the texture details) onto the first normal map (with accurate orientation) to get a combined normal map that serves are a good approximation for the scene.
  • In one embodiment, the base normal map is piece-wise smooth but discontinuous at depth edges, and contain the correct surface slopes. This normal map is computed by filtering the depth normals with a guided filter, guided by the depth map. A wide window size may be used that corresponds to about a solid angle of about 17.5 degrees.
  • To generate the details map, the normal map estimation module 128 hallucinates from the luminance image, estimating a normal map by hallucinating a depth map just from the image data, assuming surface depth is inversely related to image intensity. While the depth generated in this manner is highly approximate, it is fully consistent with the image data and provides a surprisingly effective means for recovering geometric detail variations.
  • The two normal maps are then combined. For a given pixel with polar and azimuthal angles (θ,ϕ), nf is the normal obtained by applying the guided filter to the depth normals, and ni is the normal obtained from the image-based estimation method. Rs, is a local coordinate frame for the image-based normal. It is obtained by setting the first row to a vector pointing radially outward,

  • R s,0=(sin θ cos ϕ, cos θ, sin θ sin ϕ),  (15)
  • and the other rows through cross products with the world up vector wup.
  • R s , 1 = R s , 0 × w up || R s , 0 × w up || , R s , 2 = R s , 0 × R s , 1 ( 16 )
  • A similar coordinate frame Rf is defined for the filtered depth normal, by setting Rf,0=−nf and the other rows analog to Eq. 16. Then, the normal map estimation module 128 transfers the details as follows:

  • n c =R f −1 R s n i  (17)
  • The resulting normal map can then be provided with the two-layer mesh to a 3D rendering system 130 for viewing by another user.
  • Additional Embodiments
  • The above described processes generate a three-dimensional reconstructed image from a set of images of a scene. the above-described processes may similarly be applied to a set of videos of a the scene by generating a three-dimensional image from corresponding frames in the set of videos (e.g., at each time instance) and combining the three-dimensional images in a sequence of three-dimensional video frames.
  • SUMMARY
  • The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
  • Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
  • Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
  • Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a tangible computer readable storage medium or any type of media suitable for storing electronic instructions, and coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • Embodiments of the invention may also relate to a computer data signal embodied in a carrier wave, where the computer data signal includes any embodiment of a computer program product or other data combination described herein. The computer data signal is a product that is presented in a tangible medium or carrier wave and modulated or otherwise encoded in the carrier wave, which is tangible, and transmitted according to any suitable transmission method.
  • Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon.

Claims (20)

What is claimed is:
1. A method comprising:
receiving a plurality of input images of a scene taken from different vantage points;
processing the plurality of input images to generate a sparse reconstruction representation of the scene, the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene;
based in part on the sparse reconstruction representation, processing the plurality of input images to generate respective dense reconstruction representations of each of the plurality of input images, the respective dense reconstruction representations each including respective depth images for the plurality of input images, the depth images including color and depth information;
projecting front surfaces of the depth images using a forward depth test to generate a plurality of front-warped images;
projecting back surfaces of the depth images using an inverted depth test to generate a plurality of back-warped images;
stitching the front-warped images and the back-warped images to generate a two-layer panorama having a front surface panorama and a back surface panorama; and
fusing the front surface panorama and the back surface panorama in the two-layer panorama to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
2. The method of claim 1, wherein processing the plurality of input images to generate the sparse reconstruction representation comprises:
applying a surface-from-motion algorithm to the plurality of input images.
3. The method of claim 1, wherein processing the plurality of input images to generate the respective dense reconstruction representations comprises:
generating a near envelope prior that assigns a cost to estimated depth values in front of a near envelope; and
applying a multi-view stereo processing algorithm to estimate the depth values based on a cost function including the near envelope prior.
4. The method of claim 3, wherein generating the near envelope prior comprises:
identifying anchor pixels in the plurality of input images that have high confidence depth estimates;
propagating the depth estimates of the anchor pixels to other pixels in the plurality of input images to generate approximate depth maps;
filtering the approximate depth maps to determine a near envelope; and
generating the near envelope prior based on the depth estimates and the near envelope.
5. The method of claim 1, wherein stitching the front-warped images and the back-warped images to generate a two-layer panorama comprises:
stitching a depth panorama using depth values from the front-warped images;
stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama;
stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and
combining the front surface panorama and the back surface panorama into the two-layer panorama.
6. The method of claim 1, wherein fusing the front surface panorama and the back surface panorama comprises:
removing background pixels from the back surface panorama that match corresponding foreground pixels in the front surface panorama;
storing connections between neighboring pixels meeting a threshold similarity in depth and color information; and
hallucinating color and depth information in missing pixel locations.
7. The method of claim 1, further comprising:
generating a normal map for the multi-layered geometric mesh, the normal map estimating for each pixel, an angle normal to a surface depicted by the pixel.
8. The method of claim 7, wherein generating the normal map comprises:
generating a base normal map from depth values in the three-dimensional image;
generating a detailed normal from luminance values in the three-dimensional image; and
transforming the detailed normal map onto the base normal map to generate a combined normal map.
9. The method of claim 1, wherein the plurality of input images having varying levels of overlap and orientation changes.
10. A non-transitory computer-readable storage medium storing instructions, the instructions when executed by a processor causing the processor to perform steps including:
receiving a plurality of input images of a scene taken from different vantage points;
processing the plurality of input images to generate a sparse reconstruction representation of the scene, the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene;
based in part on the sparse reconstruction representation, processing the plurality of input images to generate respective dense reconstruction representations of each of the plurality of input images, the respective dense reconstruction representations each including respective depth images for the plurality of input images, the depth images including color and depth information;
projecting front surfaces of the depth images using a forward depth test to generate a plurality of front-warped images;
projecting back surfaces of the depth images using an inverted depth test to generate a plurality of back-warped images;
stitching the front-warped images and the back-warped images to generate a two-layer panorama having a front surface panorama and a back surface panorama;
fusing the front surface panorama and the back surface panorama in the two-layer panorama to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
11. The non-transitory computer-readable storage medium of claim 10, wherein processing the plurality of input images to generate the sparse reconstruction representation comprises:
applying a surface-from-motion algorithm to the plurality of input images.
12. The non-transitory computer-readable storage medium of claim 10, wherein processing the plurality of input images to generate the respective dense reconstruction representations comprises:
generating a near envelope prior that assigns a cost to estimated depth values in front of a near envelope; and
applying a multi-view stereo processing algorithm to estimate the depth values based on a cost function including the near envelope prior.
13. The non-transitory computer-readable storage medium of claim 12, wherein generating the near envelope prior comprises:
identifying anchor pixels in the plurality of input images that have high confidence depth estimates;
propagating the depth estimates of the anchor pixels to other pixels in the plurality of input images to generate approximate depth maps;
filtering the approximate depth maps to determine a near envelope; and
generating the near envelope prior based on the depth estimates and the near envelope.
14. The non-transitory computer-readable storage medium of claim 10, wherein stitching the front-warped images and the back-warped images to generate a two-layer panorama comprises:
stitching a depth panorama using depth values from the front-warped images;
stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama;
stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and
combining the front surface panorama and the back surface panorama into the two-layer panorama.
15. The non-transitory computer-readable storage medium of claim 10, wherein fusing the front surface panorama and the back surface panorama comprises:
removing background pixels from the back surface panorama that match corresponding foreground pixels in the front surface panorama;
storing connections between neighboring pixels meeting a threshold similarity in depth and color information; and
hallucinating color and depth information in missing pixel locations.
16. The non-transitory computer-readable storage medium of claim 10, wherein the instructions when executed by processor further cause the processor to perform steps including:
generating a normal map for the multi-layered geometric mesh, the normal map estimating for each pixel, an angle normal to a surface depicted by the pixel.
17. The non-transitory computer-readable storage medium of claim 16, wherein generating the normal map comprises:
generating a base normal map from depth values in the three-dimensional image;
generating a detailed normal from luminance values in the three-dimensional image; and
transforming the detailed normal map onto the base normal map to generate a combined normal map.
18. The non-transitory computer-readable storage medium of claim 10, wherein the plurality of input images having varying levels of overlap and orientation changes.
19. A system comprising:
a processor; and
a non-transitory computer-readable storage medium storing instruction for generating a three-dimensional image, the instructions when executed by a processor causing the processor to perform steps including:
receiving a plurality of input images of a scene taken from different vantage points;
processing the plurality of input images to generate a sparse reconstruction representation of the scene, the sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene;
based in part on the sparse reconstruction representation, processing the plurality of input images to generate respective dense reconstruction representations of each of the plurality of input images, the respective dense reconstruction representations each including respective depth images for the plurality of input images, the depth images including color and depth information;
projecting front surfaces of the depth images using a forward depth test to generate a plurality of front-warped images;
projecting back surfaces of the depth images using an inverted depth test to generate a plurality of back-warped images;
stitching the front-warped images and the back-warped images to generate a two-layer panorama having a front surface panorama and a back surface panorama;
fusing the front surface panorama and the back surface panorama in the two-layer panorama to generate the three-dimensional image comprising a multi-layered geometric mesh suitable for rendering the scene in the three-dimensional space.
20. The system of claim 19, wherein stitching the front-warped images and the back-warped images to generate a two-layer panorama comprises:
stitching a depth panorama using depth values from the front-warped images;
stitching the front surface panorama using color values from the front-warped images and stitched depth values from the depth panorama;
stitching the back surface panorama using color values from the back-warped images and the stitched depth values from the depth panorama; and
combining the front surface panorama and the back surface panorama into the two-layer panorama.
US15/489,503 2017-01-17 2017-04-17 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality Active US10038894B1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US15/489,503 US10038894B1 (en) 2017-01-17 2017-04-17 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality
PCT/US2017/031839 WO2018136106A1 (en) 2017-01-17 2017-05-09 Three-dimensional scene reconstruction from set of two-dimensional images for consumption in virtual reality
CN201780083870.1A CN110192222B (en) 2017-01-17 2017-05-09 Three-dimensional scene reconstruction from two-dimensional image sets for consumption in virtual reality
EP17180515.3A EP3349176B1 (en) 2017-01-17 2017-07-10 Three-dimensional scene reconstruction from set of two-dimensional images for consumption in virtual reality
US16/018,061 US20180302612A1 (en) 2017-01-17 2018-06-26 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762447128P 2017-01-17 2017-01-17
US15/489,503 US10038894B1 (en) 2017-01-17 2017-04-17 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/018,061 Continuation US20180302612A1 (en) 2017-01-17 2018-06-26 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality

Publications (2)

Publication Number Publication Date
US20180205941A1 true US20180205941A1 (en) 2018-07-19
US10038894B1 US10038894B1 (en) 2018-07-31

Family

ID=62841768

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/489,503 Active US10038894B1 (en) 2017-01-17 2017-04-17 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality
US16/018,061 Abandoned US20180302612A1 (en) 2017-01-17 2018-06-26 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/018,061 Abandoned US20180302612A1 (en) 2017-01-17 2018-06-26 Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality

Country Status (3)

Country Link
US (2) US10038894B1 (en)
CN (1) CN110192222B (en)
WO (1) WO2018136106A1 (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180255290A1 (en) * 2015-09-22 2018-09-06 Fyusion, Inc. System and method for generating combined embedded multi-view interactive digital media representations
US10165258B2 (en) * 2016-04-06 2018-12-25 Facebook, Inc. Efficient determination of optical flow between images
CN110176032A (en) * 2019-04-28 2019-08-27 暗物智能科技(广州)有限公司 A kind of three-dimensional rebuilding method and device
CN111210514A (en) * 2019-10-31 2020-05-29 浙江中测新图地理信息技术有限公司 Method for fusing photos into three-dimensional scene in batch
US20200195904A1 (en) * 2020-02-26 2020-06-18 Intel Corporation Depth Based 3D Reconstruction using an A-Priori Depth Scene
CN111429566A (en) * 2020-03-20 2020-07-17 广东三维家信息科技有限公司 Reconstruction method and device of virtual home decoration scene and electronic equipment
CN111476907A (en) * 2020-04-14 2020-07-31 青岛小鸟看看科技有限公司 Positioning and three-dimensional scene reconstruction device and method based on virtual reality technology
CN111612898A (en) * 2020-06-18 2020-09-01 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and electronic equipment
US10931929B2 (en) * 2018-12-07 2021-02-23 Xyken, LLC Method and system of discriminative recovery of three-dimensional digital data of a target of interest in a cluttered or controlled environment
CN112862736A (en) * 2021-02-05 2021-05-28 浙江大学 Real-time three-dimensional reconstruction and optimization method based on points
CN112967398A (en) * 2021-03-01 2021-06-15 北京奇艺世纪科技有限公司 Three-dimensional data reconstruction method and device and electronic equipment
CN113129352A (en) * 2021-04-30 2021-07-16 清华大学 Sparse light field reconstruction method and device
CN113160398A (en) * 2020-12-25 2021-07-23 中国人民解放军国防科技大学 Rapid three-dimensional grid construction system, method, medium, equipment and unmanned vehicle
CN113436338A (en) * 2021-07-14 2021-09-24 中德(珠海)人工智能研究院有限公司 Three-dimensional reconstruction method and device for fire scene, server and readable storage medium
CN113692738A (en) * 2019-02-05 2021-11-23 杰瑞·尼姆斯 Method and system for simulating three-dimensional image sequence
US20210374972A1 (en) * 2019-02-20 2021-12-02 Huawei Technologies Co., Ltd. Panoramic video data processing method, terminal, and storage medium
US11195314B2 (en) 2015-07-15 2021-12-07 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US20210385390A1 (en) * 2020-06-09 2021-12-09 Canon Kabushiki Kaisha Processing apparatus, processing system, image pickup apparatus, processing method, and memory medium
US11202017B2 (en) 2016-10-06 2021-12-14 Fyusion, Inc. Live style transfer on a mobile device
CN114140510A (en) * 2021-12-03 2022-03-04 北京影谱科技股份有限公司 Incremental three-dimensional reconstruction method and device and computer equipment
CN114419272A (en) * 2022-01-20 2022-04-29 盈嘉互联(北京)科技有限公司 Indoor positioning method based on single photo and BIM
CN114895796A (en) * 2022-07-15 2022-08-12 杭州易绘科技有限公司 Space interaction method and device based on panoramic image and application
CN114898028A (en) * 2022-04-29 2022-08-12 厦门大学 Scene reconstruction and rendering method based on point cloud, storage medium and electronic equipment
US11435869B2 (en) 2015-07-15 2022-09-06 Fyusion, Inc. Virtual reality environment based manipulation of multi-layered multi-view interactive digital media representations
US20220343522A1 (en) * 2021-04-16 2022-10-27 Adobe Inc. Generating enhanced three-dimensional object reconstruction models from sparse set of object images
US11488380B2 (en) 2018-04-26 2022-11-01 Fyusion, Inc. Method and apparatus for 3-D auto tagging
US11551363B2 (en) * 2020-06-04 2023-01-10 Toyota Research Institute, Inc. Systems and methods for self-supervised residual flow estimation
US11562537B2 (en) * 2018-11-14 2023-01-24 Manoj Wad System for rapid digitization of an article
US11568516B2 (en) * 2019-09-12 2023-01-31 Nikon Corporation Depth-based image stitching for handling parallax
CN115861070A (en) * 2022-12-14 2023-03-28 湖南凝服信息科技有限公司 Three-dimensional video fusion splicing method
US11636637B2 (en) 2015-07-15 2023-04-25 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US20230136235A1 (en) * 2021-10-28 2023-05-04 Nvidia Corporation 3d surface reconstruction with point cloud densification using artificial intelligence for autonomous systems and applications
US11776229B2 (en) 2017-06-26 2023-10-03 Fyusion, Inc. Modification of multi-view interactive digital media representation
US11783864B2 (en) 2015-09-22 2023-10-10 Fyusion, Inc. Integration of audio into a multi-view interactive digital media representation
CN117351170A (en) * 2023-10-09 2024-01-05 北京达美盛软件股份有限公司 Method and system for realizing regional three-dimensional model replacement
US11876948B2 (en) 2017-05-22 2024-01-16 Fyusion, Inc. Snapshots at predefined intervals or angles
US11956412B2 (en) 2015-07-15 2024-04-09 Fyusion, Inc. Drone based capture of multi-view interactive digital media
US11960533B2 (en) 2017-01-18 2024-04-16 Fyusion, Inc. Visual search using multi-view interactive digital media representations
CN117994444A (en) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 Reconstruction method, device and storage medium of complex scene
US12039663B2 (en) 2021-10-28 2024-07-16 Nvidia Corporation 3D surface structure estimation using neural networks for autonomous systems and applications
CN118365805A (en) * 2024-06-19 2024-07-19 淘宝(中国)软件有限公司 Three-dimensional scene reconstruction method and electronic equipment
US12100230B2 (en) 2021-10-28 2024-09-24 Nvidia Corporation Using neural networks for 3D surface structure estimation based on real-world data for autonomous systems and applications

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10552970B2 (en) * 2018-01-12 2020-02-04 Qualcomm Incorporated Efficient guide filter for depth refinement
US10915781B2 (en) * 2018-03-01 2021-02-09 Htc Corporation Scene reconstructing system, scene reconstructing method and non-transitory computer-readable medium
WO2020235979A1 (en) 2019-05-23 2020-11-26 삼성전자 주식회사 Method and device for rendering point cloud-based data
CN113450291B (en) * 2020-03-27 2024-03-01 北京京东乾石科技有限公司 Image information processing method and device
US11636578B1 (en) 2020-05-15 2023-04-25 Apple Inc. Partial image completion
TWI756956B (en) * 2020-12-01 2022-03-01 財團法人工業技術研究院 Image processing method and device for panorama image
US11481871B2 (en) 2021-03-12 2022-10-25 Samsung Electronics Co., Ltd. Image-guided depth propagation for space-warping images
CN113112581A (en) * 2021-05-13 2021-07-13 广东三维家信息科技有限公司 Texture map generation method, device and equipment for three-dimensional model and storage medium
US11741671B2 (en) 2021-06-16 2023-08-29 Samsung Electronics Co., Ltd. Three-dimensional scene recreation using depth fusion

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7085409B2 (en) 2000-10-18 2006-08-01 Sarnoff Corporation Method and apparatus for synthesizing new video and/or still imagery from a collection of real video and/or still imagery
US7856125B2 (en) 2006-01-31 2010-12-21 University Of Southern California 3D face reconstruction from 2D images
JP5039922B2 (en) * 2008-03-21 2012-10-03 インターナショナル・ビジネス・マシーンズ・コーポレーション Image drawing system, image drawing server, image drawing method, and computer program
EP2385705A4 (en) * 2008-12-30 2011-12-21 Huawei Device Co Ltd Method and device for generating stereoscopic panoramic video stream, and method and device of video conference
US8933925B2 (en) * 2009-06-15 2015-01-13 Microsoft Corporation Piecewise planar reconstruction of three-dimensional scenes
US8659597B2 (en) * 2010-09-27 2014-02-25 Intel Corporation Multi-view ray tracing using edge detection and shader reuse
PL2715449T3 (en) * 2011-05-31 2017-06-30 Nokia Technologies Oy Methods, apparatuses and computer program products for generating panoramic images using depth map data
KR101818778B1 (en) * 2012-03-23 2018-01-16 한국전자통신연구원 Apparatus and method of generating and consuming 3d data format for generation of realized panorama image
KR101370718B1 (en) 2012-10-26 2014-03-06 한국과학기술원 Method and apparatus for 2d to 3d conversion using panorama image
US9269187B2 (en) * 2013-03-20 2016-02-23 Siemens Product Lifecycle Management Software Inc. Image-based 3D panorama
US9412172B2 (en) * 2013-05-06 2016-08-09 Disney Enterprises, Inc. Sparse light field representation
CN105308621B (en) 2013-05-29 2019-05-21 王康怀 Polyphaser capsule reconstructed image from the living body
US9443330B2 (en) * 2013-06-25 2016-09-13 Siemens Medical Solutions Usa, Inc. Reconstruction of time-varying data
CN105825544B (en) * 2015-11-25 2019-08-20 维沃移动通信有限公司 A kind of image processing method and mobile terminal
EP3340618A1 (en) * 2016-12-22 2018-06-27 Thomson Licensing Geometric warping of a stereograph by positional constraints

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020355B2 (en) 2015-07-15 2024-06-25 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US11956412B2 (en) 2015-07-15 2024-04-09 Fyusion, Inc. Drone based capture of multi-view interactive digital media
US11195314B2 (en) 2015-07-15 2021-12-07 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US11776199B2 (en) 2015-07-15 2023-10-03 Fyusion, Inc. Virtual reality environment based manipulation of multi-layered multi-view interactive digital media representations
US11636637B2 (en) 2015-07-15 2023-04-25 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US11632533B2 (en) 2015-07-15 2023-04-18 Fyusion, Inc. System and method for generating combined embedded multi-view interactive digital media representations
US11435869B2 (en) 2015-07-15 2022-09-06 Fyusion, Inc. Virtual reality environment based manipulation of multi-layered multi-view interactive digital media representations
US11095869B2 (en) * 2015-09-22 2021-08-17 Fyusion, Inc. System and method for generating combined embedded multi-view interactive digital media representations
US20180255290A1 (en) * 2015-09-22 2018-09-06 Fyusion, Inc. System and method for generating combined embedded multi-view interactive digital media representations
US11783864B2 (en) 2015-09-22 2023-10-10 Fyusion, Inc. Integration of audio into a multi-view interactive digital media representation
US10165258B2 (en) * 2016-04-06 2018-12-25 Facebook, Inc. Efficient determination of optical flow between images
US10257501B2 (en) 2016-04-06 2019-04-09 Facebook, Inc. Efficient canvas view generation from intermediate views
US11202017B2 (en) 2016-10-06 2021-12-14 Fyusion, Inc. Live style transfer on a mobile device
US11960533B2 (en) 2017-01-18 2024-04-16 Fyusion, Inc. Visual search using multi-view interactive digital media representations
US11876948B2 (en) 2017-05-22 2024-01-16 Fyusion, Inc. Snapshots at predefined intervals or angles
US11776229B2 (en) 2017-06-26 2023-10-03 Fyusion, Inc. Modification of multi-view interactive digital media representation
US11967162B2 (en) 2018-04-26 2024-04-23 Fyusion, Inc. Method and apparatus for 3-D auto tagging
US11488380B2 (en) 2018-04-26 2022-11-01 Fyusion, Inc. Method and apparatus for 3-D auto tagging
US11562537B2 (en) * 2018-11-14 2023-01-24 Manoj Wad System for rapid digitization of an article
US10931929B2 (en) * 2018-12-07 2021-02-23 Xyken, LLC Method and system of discriminative recovery of three-dimensional digital data of a target of interest in a cluttered or controlled environment
CN113692738A (en) * 2019-02-05 2021-11-23 杰瑞·尼姆斯 Method and system for simulating three-dimensional image sequence
US20210374972A1 (en) * 2019-02-20 2021-12-02 Huawei Technologies Co., Ltd. Panoramic video data processing method, terminal, and storage medium
CN110176032A (en) * 2019-04-28 2019-08-27 暗物智能科技(广州)有限公司 A kind of three-dimensional rebuilding method and device
US11568516B2 (en) * 2019-09-12 2023-01-31 Nikon Corporation Depth-based image stitching for handling parallax
CN111210514A (en) * 2019-10-31 2020-05-29 浙江中测新图地理信息技术有限公司 Method for fusing photos into three-dimensional scene in batch
US20200195904A1 (en) * 2020-02-26 2020-06-18 Intel Corporation Depth Based 3D Reconstruction using an A-Priori Depth Scene
US12052408B2 (en) * 2020-02-26 2024-07-30 Intel Corporation Depth based 3D reconstruction using an a-priori depth scene
CN111429566A (en) * 2020-03-20 2020-07-17 广东三维家信息科技有限公司 Reconstruction method and device of virtual home decoration scene and electronic equipment
CN111476907A (en) * 2020-04-14 2020-07-31 青岛小鸟看看科技有限公司 Positioning and three-dimensional scene reconstruction device and method based on virtual reality technology
US11551363B2 (en) * 2020-06-04 2023-01-10 Toyota Research Institute, Inc. Systems and methods for self-supervised residual flow estimation
US20210385390A1 (en) * 2020-06-09 2021-12-09 Canon Kabushiki Kaisha Processing apparatus, processing system, image pickup apparatus, processing method, and memory medium
US11997396B2 (en) * 2020-06-09 2024-05-28 Canon Kabushiki Kaisha Processing apparatus, processing system, image pickup apparatus, processing method, and memory medium
CN111612898A (en) * 2020-06-18 2020-09-01 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN113160398A (en) * 2020-12-25 2021-07-23 中国人民解放军国防科技大学 Rapid three-dimensional grid construction system, method, medium, equipment and unmanned vehicle
CN112862736A (en) * 2021-02-05 2021-05-28 浙江大学 Real-time three-dimensional reconstruction and optimization method based on points
CN112967398A (en) * 2021-03-01 2021-06-15 北京奇艺世纪科技有限公司 Three-dimensional data reconstruction method and device and electronic equipment
US20220343522A1 (en) * 2021-04-16 2022-10-27 Adobe Inc. Generating enhanced three-dimensional object reconstruction models from sparse set of object images
US11669986B2 (en) * 2021-04-16 2023-06-06 Adobe Inc. Generating enhanced three-dimensional object reconstruction models from sparse set of object images
CN113129352A (en) * 2021-04-30 2021-07-16 清华大学 Sparse light field reconstruction method and device
CN113436338A (en) * 2021-07-14 2021-09-24 中德(珠海)人工智能研究院有限公司 Three-dimensional reconstruction method and device for fire scene, server and readable storage medium
US20230136235A1 (en) * 2021-10-28 2023-05-04 Nvidia Corporation 3d surface reconstruction with point cloud densification using artificial intelligence for autonomous systems and applications
US12100230B2 (en) 2021-10-28 2024-09-24 Nvidia Corporation Using neural networks for 3D surface structure estimation based on real-world data for autonomous systems and applications
US12039663B2 (en) 2021-10-28 2024-07-16 Nvidia Corporation 3D surface structure estimation using neural networks for autonomous systems and applications
CN114140510A (en) * 2021-12-03 2022-03-04 北京影谱科技股份有限公司 Incremental three-dimensional reconstruction method and device and computer equipment
CN114419272A (en) * 2022-01-20 2022-04-29 盈嘉互联(北京)科技有限公司 Indoor positioning method based on single photo and BIM
CN114898028A (en) * 2022-04-29 2022-08-12 厦门大学 Scene reconstruction and rendering method based on point cloud, storage medium and electronic equipment
CN114895796A (en) * 2022-07-15 2022-08-12 杭州易绘科技有限公司 Space interaction method and device based on panoramic image and application
CN115861070A (en) * 2022-12-14 2023-03-28 湖南凝服信息科技有限公司 Three-dimensional video fusion splicing method
CN117351170A (en) * 2023-10-09 2024-01-05 北京达美盛软件股份有限公司 Method and system for realizing regional three-dimensional model replacement
CN117994444A (en) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 Reconstruction method, device and storage medium of complex scene
CN118365805A (en) * 2024-06-19 2024-07-19 淘宝(中国)软件有限公司 Three-dimensional scene reconstruction method and electronic equipment

Also Published As

Publication number Publication date
CN110192222B (en) 2023-05-23
WO2018136106A1 (en) 2018-07-26
US20180302612A1 (en) 2018-10-18
CN110192222A (en) 2019-08-30
US10038894B1 (en) 2018-07-31

Similar Documents

Publication Publication Date Title
US10038894B1 (en) Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality
EP3349176B1 (en) Three-dimensional scene reconstruction from set of two-dimensional images for consumption in virtual reality
Holynski et al. Fast depth densification for occlusion-aware augmented reality
Philip et al. Multi-view relighting using a geometry-aware network.
Hedman et al. Casual 3D photography
Thies et al. Ignor: Image-guided neural object rendering
Dolson et al. Upsampling range data in dynamic environments
Zhou et al. Plane-based content preserving warps for video stabilization
Kadambi et al. 3d depth cameras in vision: Benefits and limitations of the hardware: With an emphasis on the first-and second-generation kinect models
US8331614B2 (en) Method and apparatus for tracking listener&#39;s head position for virtual stereo acoustics
US8593506B2 (en) Method and system for forming a panoramic image of a scene having minimal aspect distortion
Meilland et al. A unified rolling shutter and motion blur model for 3D visual registration
US11887256B2 (en) Deferred neural rendering for view extrapolation
Taneja et al. Modeling dynamic scenes recorded with freely moving cameras
US11620730B2 (en) Method for merging multiple images and post-processing of panorama
US20150147047A1 (en) Simulating tracking shots from image sequences
CN112184603B (en) Point cloud fusion method and device, electronic equipment and computer storage medium
US12062145B2 (en) System and method for three-dimensional scene reconstruction and understanding in extended reality (XR) applications
WO2008111080A1 (en) Method and system for forming a panoramic image of a scene having minimal aspect distortion
EP2439700B1 (en) Method and Arrangement for Identifying Virtual Visual Information in Images
CN111260544B (en) Data processing method and device, electronic equipment and computer storage medium
Okura et al. Aerial full spherical HDR imaging and display
Pollok et al. Computer vision meets visual analytics: Enabling 4D crime scene investigation from image and video data
Moynihan et al. Spatio-temporal Upsampling for Free Viewpoint Video Point Clouds.
Clark et al. 3D environment capture from monocular video and inertial data

Legal Events

Date Code Title Description
AS Assignment

Owner name: FACEBOOK, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOPF, JOHANNES PETER;HEDMAN, LARS PETER JOHANNES;SZELISKI, RICHARD;SIGNING DATES FROM 20170421 TO 20170501;REEL/FRAME:042198/0430

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: META PLATFORMS, INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:FACEBOOK, INC.;REEL/FRAME:058897/0824

Effective date: 20211028

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4