US11922562B2 - Methods and systems for rendering view-dependent images using 2D images - Google Patents
Methods and systems for rendering view-dependent images using 2D images Download PDFInfo
- Publication number
- US11922562B2 US11922562B2 US17/644,291 US202117644291A US11922562B2 US 11922562 B2 US11922562 B2 US 11922562B2 US 202117644291 A US202117644291 A US 202117644291A US 11922562 B2 US11922562 B2 US 11922562B2
- Authority
- US
- United States
- Prior art keywords
- rendering
- appearance
- shape
- neural
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 230000001419 dependent effect Effects 0.000 title claims abstract description 34
- 238000009877 rendering Methods 0.000 title claims description 91
- 230000001537 neural effect Effects 0.000 claims abstract description 74
- 238000012549 training Methods 0.000 claims description 17
- 238000005070 sampling Methods 0.000 claims description 5
- 230000001174 ascending effect Effects 0.000 claims description 4
- 230000004069 differentiation Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 230000003750 conditioning effect Effects 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 abstract description 26
- 230000006870 function Effects 0.000 description 44
- 210000001508 eye Anatomy 0.000 description 22
- 238000013459 approach Methods 0.000 description 19
- 210000003128 head Anatomy 0.000 description 18
- 238000003786 synthesis reaction Methods 0.000 description 10
- 230000015572 biosynthetic process Effects 0.000 description 9
- 230000006399 behavior Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 6
- 230000000737 periodic effect Effects 0.000 description 5
- 239000000700 radioactive tracer Substances 0.000 description 5
- 241000269400 Sirenidae Species 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000001994 activation Methods 0.000 description 2
- 230000001143 conditioned effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 210000000887 face Anatomy 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 210000001747 pupil Anatomy 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000001373 regressive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/012—Head tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
- H04N13/279—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals the virtual viewpoint locations being selected by the viewers or determined by tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention relates to methods and systems for rendering view-dependent images and, more particularly, to methods and systems for generating a plurality of view-dependent images for displays.
- emerging neural implicit scene representations may produce 3D-structure-aware, continuous, memory-efficient representations for shape parts, objects, and/or scenes. These representations may define an object or a scene using a neural network and can be supervised directly with 3D data, such as point clouds, or with 2D multi-view images.
- Systems and methods in accordance with various embodiments of the invention can include a head mounted display including: a display; a processor; and memory.
- the memory may include programming executable by the processor to: receive a plurality of 2D images of a 3D object; provide, to a neural network, the plurality of 2D images; generate a 3D neural model from the plurality of 2D images using a neural network; generate a triangular mesh using the 3D neural model; track head position of a viewer; and render a plurality of view-dependent images using the triangular mesh to generate a left view and a right view based on the head position of the viewer, wherein the head mounted display is configured to display the left view to a left eye of the viewer and the right view to a right eye of the viewer.
- the neural network includes a signed distance function based sinusoidal representation network.
- various embodiments of the invention can include an image rendering system including: a processor and memory.
- the memory may include programming executable by the processor to: receive a plurality of 2D images of a 3D object; provide, to a neural network, the plurality of 2D images; generate a 3D neural model from the plurality of 2D images using a neural network; generate a triangular mesh using the 3D neural model; and render a plurality of view-dependent images using the triangular mesh to generate a left view and a right view based on a head position of a viewer; and display, on a head mounted display, the left view to a left eye of the viewer and the right view to a right eye of the viewer.
- the neural network comprises a signed distance function based sinusoidal representation network.
- various embodiments of the invention can include an image rendering method for generating a plurality of view-dependent images at a display including a plurality of pixels, comprising: obtaining a 3D neural model from image data capturing a 3D shape of an object by obtaining a zero-level set of a signed distance function using a shape renderer of a rendering engine; modeling an appearance of the object by minimizing an image reconstruction error based upon the image data capturing the 3D shape of the object using an appearance renderer of the rendering engine; converting the neural model into a triangular mesh representing the object using the rendering engine; and rendering at least one image using the triangular mesh using the rendering engine.
- the signed distance function is represented by: S ( x ; ⁇ ): 3 ⁇ ,
- x ⁇ 3 is a location in 3D space and ⁇ is a first learnable parameter of a sinusoidal representation network.
- obtaining the zero-level set of the signed distance function comprises sphere tracing the signed distance function.
- sphere tracing the signed distance function includes:
- modeling the appearance of the object includes using a spatially varying emission function E.
- modeling the appearance further comprises defining the spatially varying emission function E for directions r d ⁇ 3 in a global coordinate system.
- the spatially varying emission function is expressible as: E ( x,r d ,n , ⁇ , ⁇ ): 9 ⁇ 3 ,
- ⁇ is a second learnable parameter of the sinusoidal representation network.
- the image rendering method further includes minimizing an image reconstruction error for the 3D object in foreground pixels of the display.
- the image reconstruction error is represented by:
- c is an RBG value of a foreground pixel of the display and U represents a portion of the pixels with RGB values I U and object masks M U .
- the image rendering method further includes regularizing the signed distance function by an eikonal constraint.
- the eikonal constraint is represented by:
- the image rendering method further includes enforcing a projected pattern to fall within the boundaries of the object masks.
- enforcing the projected pattern includes using a soft mask loss defined for pixels other than the foreground pixels of the display.
- the soft mask loss is represented by:
- L M 1 ⁇ ⁇ ⁇ U ⁇ ⁇ ⁇ m ⁇ M U / U f ⁇ BCE ⁇ ( sigmoid ⁇ ( - ⁇ ⁇ S m ⁇ ⁇ i ⁇ ⁇ n ) , m ) ,
- the image rendering method further includes regularizing the emissivity function to avoid overfitting to training views.
- regulating the emissivity function comprises linearizing the angular behavior using a smoothness term represented by:
- L S 1 ⁇ U ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ r d 2 ⁇ E ⁇ ( x , r d , n ; ⁇ , ⁇ ) ⁇ 2 2 .
- the image rendering method further includes optimizing parameters ⁇ and ⁇ as:
- w E , w M , and w S are weights for their respective loss functions.
- the image rendering method further includes rasterizing the triangular mesh; projecting vertex positions to each one of the plurality of pixels; computing angles ⁇ 1 . . . N between a ray towards a rendering camera and rays towards each of N projective texture map viewpoints.
- FIG. 1 A illustrates a light field display that includes a display area formed by an array of angular pixels and one or more eye trackers in accordance with an embodiment of the invention.
- FIG. 1 B illustrates a head mounted display that displays a left image and a right image based on a user's head position in accordance with an embodiment of the invention.
- FIG. 2 illustrates a schematic representation of a neural network pipeline in accordance with an embodiment of the invention.
- FIG. 3 is a block diagram of a computing system which renders view-dependent 2D images in accordance with an embodiment of the invention.
- FIG. 4 is a flowchart of a method for rendering view-dependent 2D images in accordance with an embodiment of the invention.
- FIG. 5 is a frontal view of a custom camera array in accordance with an embodiment of the invention.
- Novel view synthesis and 3D shape estimation from 2D images may include inverse problems of fundamental importance in applications as diverse as photogrammetry, remote sensing, visualization, AR/VR, teleconferencing, visual effects, and/or games. That is, it would be desirable to produce novel views of a 3D object, enabling view-dependent real-time rendering with photorealistic image quality using traditional graphics pipelines.
- Emerging neural scene representations often model an object or scene explicitly using a 3D proxy geometry, such as an imperfect mesh or depth map estimated by multi-view stereo or other means, an object-specific shape template, a multi-plane or multi-sphere image, or a volume.
- state-of-the-art neural volume rendering approaches are slow to train and require minutes of rendering time for high image resolutions. That is, state-of-the-art neural rendering approaches, such as neural radiance fields, typically do not offer real-time frame rates, which severely limits their applicability to the aforementioned problems. This limitation may be primarily imposed by the choice of implicit neural scene representation and rendering algorithm, namely a volumetric representation that involves a custom neural volume renderer.
- Neural surface representations for example using signed distance functions (SDFs), occupancy fields, or feature-based representations, on the other hand implicitly model the surface of objects.
- SDFs signed distance functions
- occupancy fields for example, occupancy fields, or feature-based representations
- feature-based representations on the other hand implicitly model the surface of objects.
- implicit neural surface representations can be shown to demonstrate impressive performance on shape reconstruction, their performance on view interpolation and synthesis tasks is limited. Thus, previous neural rendering approaches may either perform well for view synthesis or 3D shape estimation, but not both.
- Embodiments described herein provide high-capacity neural scene representations with periodic activations for jointly optimizing an implicit surface and a radiance field of a scene, supervised exclusively with posed 2D images.
- the implicit surface representation described herein enables export of a 3D mesh with view-dependent texture information.
- the embodiments described herein are compatible with traditional graphics pipelines, enabling real-time rendering rates, while achieving unprecedented image quality compared to other surface methods. This approach may accelerate the neural volume rendering capacity by approximately two orders of magnitude over the existing state of the art.
- Various embodiments of the 2D-supervised implicit neural scene representation and rendering approach include providing to a neural network a set of 2D multi-view images for optimizing representation networks modeling shape and appearance of a scene including an object.
- the scene may be modeled using a differentiable sphere tracer to generate a 3D model.
- the resulting 3D model may be exported to enable view-dependent real-time rendering using traditional graphics pipelines.
- the 3D model may be a 3D point cloud.
- the neural network may include a neural surface representation using an SDF.
- the neural network may include an SDF-based sinusoidal representation network (SIREN).
- SIREN SDF-based sinusoidal representation network
- the neural model can be used to represent a shape of the object using supervision with 2D images via neural rendering. For example, a shape of the object may be represented by obtaining a zero-level set of the signed distance function.
- the neural model may be converted into a triangular mesh representing the shape.
- the triangular mesh may be used to render multiple images representative of the 3D scene including the object. The multiple images may be based on different views based on the specific locations of a viewer's tracked eyes.
- the surfaces of objects can be extracted from neural surface models using methods including (but not limited to) the marching cubes algorithm and exported into traditional mesh-based representations for real-time rendering.
- views are generated corresponding to the specific locations of tracked eyes.
- the views may be displayed on a light field display.
- the light field display may include eye trackers which may track the user's eyes. Examples of systems and methods for generating different views based on a viewer's tracked eyes are described in U.S. Pat. Pub. No. 2021/0132693, entitled “Light Field Displays Incorporating Eye Trackers and Methods for Generating Views for a Light Field Display Using Eye Tracking Information” and filed Nov. 2, 2020 which is hereby incorporated by reference in its entirety for all purposes.
- the views may be displayed on a head mounted display, with different views displayed according to the position of the user's head, eyes, pupils, and/or gaze direction.
- a head mounted display e.g. AR, mixed reality, or VR headset
- a tracker for tracking the user's head position and/or gaze direction.
- Various disclosed embodiments include SDF-based SIREN as the backbone of a neural rendering system, which enables representation of signals with significantly higher complexity within the same number of learnable parameters compared to existing art, such as non-periodic multilayer perceptrons (MLP).
- MLP non-periodic multilayer perceptrons
- neural implicit representations that use implicitly defined volumes are distinct from those using implicitly defined surfaces, for example represented as signed distance functions (SDFs) or occupancy networks.
- SDFs signed distance functions
- Surface-based representations may allow for traditional mesh representations to be extracted and rendered efficiently with traditional computer graphics pipelines.
- NeRF neural radiance fields
- IDR implicit differentiable renderer
- the SDF-based SIREN approach may be used to learn 3D shapes using 2D supervision with images via neural rendering.
- a novel loss function that maintains the SIREN's high-capacity encoding for the supervised images is used in several embodiments of the invention to constrain SIREN's high-capacity encoding in the angular domain to prevent overfitting on these views.
- this training procedure allows for robust fitting of a SIREN-based SDF directly to a sparse set of multi-view images.
- 2D-supervised implicit neural scene representation and rendering approaches can perform on par with NeRF on view interpolation tasks while providing a high-quality 3D surface that can be directly exported for real-time rendering.
- the SDF-based SIREN system may include a neural network acting as a shape and appearance renderer to learn a 3D representation of the object in the form of a SDF zero-level set and appearance function.
- a 3D mesh e.g., 3D model
- the triangle mesh may be provided as an input to a real-time renderer for generation of specific views from a given viewpoint to be displayed to a viewer.
- the neural network and the training thereof takes place remotely from the display device, while a real-time renderer may reside on the display device, such as a light field display or a head mounted display.
- a real-time renderer may reside on the display device, such as a light field display or a head mounted display.
- all of the neural network, the training thereof, and the real-time image rendering are located off the display device or on the display device itself.
- Various embodiments of the invention include a neural rendering framework including an implicit neural 3D scene representation, a neural renderer, and a custom loss function for training.
- this approach may achieve 10 times higher rendering rates than NeRF while providing comparable image quality with the additional benefit of optimizing an implicitly defined surface.
- both shape and view-dependent appearance of the neural scene representation can be exported and rendered in real time using traditional graphics pipelines.
- a custom camera array may capture several datasets of faces and heads for providing baselines, which may be used for standardizing the approach for various objects and applications.
- FIG. 1 A illustrates a light field display 100 that includes a display area 102 formed by an array of angular pixels and one or more eye trackers 104 in accordance with an embodiment of the invention.
- Each angular pixel can be thought of as similar to a conventional pixel in a 2D display with the difference that its appearance can vary across the field of view of the display. In this way, each viewing zone of the light field display can display an image with a resolution equal to the number of angular pixels in the light field display.
- Each angular pixel can include an array of light emitters, such as described for example in US Pat. Pub. No.
- the light field display 100 may include a computing system for rendering multiple 2D views. An example of this computing system may be found in FIG. 3 described in detail below.
- the light field display 100 may be connected to a separate computing system and may receive the pre-rendered multiple 2D views from the separate computing system.
- the light field display 100 may export the eye-tracking data from the one or more eye trackers 104 to the separate computing system which may use this eye-tracking data to generate corresponding 2D views.
- FIG. 1 B illustrates a head mounted display that displays a left image and a right image based on a user's head position.
- the head mounted display 150 may include a left display 152 and a right display 154 for displaying different images to the left eye and the right eye of the user respectively.
- the head mounted display 150 may also include a head tracker which may track the position of the user's head. In some embodiments, the head tracker may track the position of the user's head, eyes, pupils, and/or gaze direction.
- the head mounted display 150 may include a computing system for rendering multiple 2D views, such as that shown in FIG. 3 .
- the head mounted display 150 may be connected to a separate computing system and may receive the pre-rendered multiple 2D views from the separate computing system.
- the head mounted display 150 may export the head position data to the separate computing system which may use this head position data to generate corresponding 2D views.
- FIG. 2 illustrates a schematic representation of a neural network system 200 for modeling the shape and appearance of a 3D object 202 using implicit functions, in accordance with an embodiment of the invention.
- the neural network system 200 may include a shape (SDF) renderer 204 which feeds into an appearance renderer 206 .
- the input provided to shape (SDF) renderer 204 can include a set of 2D images of the 3D object 202 within a scene.
- the input provided to the shape (SDF) renderer 204 may be a 3D mesh (e.g., from another method or a depth sensor).
- the shape SDF renderer 204 models the shape of the 3D object using a differentiable sphere tracer.
- the output from shape (SDF) renderer 204 and appearance renderer 206 may be a 3D neural model 208 of 3D object 202 .
- the resulting 3D neural model 208 may be, for instance, a 3D mesh that may be exported to a real-time image rendering pipeline for producing a variety of view-dependent images 210 of the original 3D object 202 by, for example, converting the 3D mesh into a triangle mesh and a plurality of textures.
- shape (SDF) renderer 204 models the shape of 3D object 202 using a differentiable sphere tracer.
- FIG. 3 is a block diagram of a computing system 300 , which models the shape and appearance of a 3D object from a set of 2D views, in accordance with an embodiment of the invention.
- the computing system 300 may be a standalone computing system or a system implemented on a head mounted display as described in connection with FIG. 1 B or a light field display as described in connection with FIG. 1 A .
- the computing system 300 includes a processor 302 for controlling the operations of an input/output interface 304 , which is capable of receiving and transmitting data, such as a set of 2D views of a 3D object, a 3D neural model such an object model, and a real-time rendering of view-dependent images, and a memory 306 .
- the memory 306 includes programming including a shape renderer 308 and an appearance renderer 310 which is executable by the processor.
- memory 306 may further include a real-time renderer 312 for using the output from appearance renderer 310 to generate in real-time a plurality of view-dependent images for displaying to a viewer.
- the real-time renderer 312 may be implemented separately from computing system 300 , for example, at the display device such as a head mounted display or a light field display.
- the shape renderer 308 and the appearance renderer 310 include a neural network.
- the input/output 304 may receive a set of 2D images of a 3D object and feed these 2D images into the shape renderer 308 and the appearance renderer 310 .
- the neural network includes a sinusoidal representation network based on a signed distance function.
- the appearance renderer 310 may model the appearance of the object.
- the shape renderer 308 may generate a neural model of a shape of an object by obtaining a zero-level set of a signed distance function.
- the neural model may then be converted by real-time renderer 312 into a triangular mesh representing the shape of the 3D object being imaged so as to render multiple view-dependent images representative of the shape of the 3D scene including the object using the triangular mesh.
- the input/output 304 may provide as an output the shape and the appearance from shape renderer 308 and appearance renderer 310 to an external rendering system, which may be used to render multiple view-dependent 2D images for display to a viewer.
- input/output 304 may directly provide the view-dependent 2D images from real-time renderer as the output.
- the input/output 204 may receive tracking information such as head tracking data or eye-tracking data which may be used by real-time renderer 312 to generate corresponding views.
- FIG. 4 is a flowchart of a method for rendering in real-time a plurality of view-dependent images, in accordance with an embodiment of the invention.
- the method 400 includes providing ( 402 ), to a neural network, a plurality of 2D images of a 3D object.
- the neural network may be implemented, for example, as shape renderer 308 and/or the appearance renderer 310 of FIG. 3 .
- the neural network includes a SDF-based sinusoidal representation network.
- the method 400 further includes modeling ( 404 ) a shape of the 3D object by obtaining a zero level set.
- the zero level set may be of the signed distance function.
- Step 404 may be implemented, for example, in shape renderer 308 of FIG. 3 .
- the method 400 further includes modeling ( 406 ) an appearance of the 3D object using a spatially varying emission function. Step 406 may be implemented, for instance, at appearance renderer 310 of FIG. 3 .
- method 400 further includes combining ( 408 ) the shape and appearance information from steps 404 and 406 to generate a neural model of the 3D object.
- Step 408 may be implemented, for instance, in the processor 302 and/or the memory 306 of FIG. 3 .
- the method 400 further includes converting ( 410 ) the neural model into a triangular mesh representing the 3D object.
- Step 410 may be implemented, for example, in the real-time renderer 312 of FIG. 3 .
- the method 400 further includes rendering ( 412 ) multiple view-dependent images of the 3D object using the triangular mesh from step 410 .
- representation of both shape and appearance of 3D objects may be performed using implicit functions in a framework similar to IDR.
- the network architecture may built on a SIREN, which can allow representation of signals of significantly higher complexity compared with common non-periodic multilayer perceptrons (MLP) using the same number of learnable parameters.
- S(x) 0 ⁇ of a signed distance function (SDF) S ( x ; ⁇ ): 3 ⁇ , (1) where x ⁇ 3 is a location in 3D space and ⁇ is a learnable parameter of the sinusoidal representation network.
- SDF signed distance function
- a spatially varying emission function, or radiance field, E for directions r d ⁇ 3 may be defined in a global coordinate system.
- This formulation may not allow for relighting but can enable photorealistic reconstruction of the appearance of a scene under fixed lighting conditions. In some embodiments. modeling lighting and shading may be performed.
- ⁇ may be used to increase the network capacity and allow for modeling of fine spatial details and microreflections that are of a notably higher spatial complexity than the underlying shape.
- the radiance field may be expressed as E ( x,r d ,n ; ⁇ , ⁇ ): 9 ⁇ 3 (2) to represent RGB appearance using the additional learnable parameters ⁇ .
- the neural rendering may be used to project a 3D neural scene representation into one or multiple 2D images. In some embodiments, this may be performed in two steps: 1) Find the 3D surface as the zero-level set S 0 closest to the camera origin along each ray; 2) Resolve the appearance by sampling the local radiance E.
- Sphere tracing the SDF may be used in step 1) to find S 0 .
- a view and a projection matrix, V ⁇ 4 ⁇ 4 and P ⁇ 4 ⁇ 4 may be defined similar to OpenGL's rendering API.
- the sphere-tracing algorithm may minimize
- S 0 ⁇ x n
- ⁇ 0.005 may be tolerated. Gradients may be retained in the last step rather than for all steps of the sphere tracer.
- this approach makes sphere tracing memory efficient.
- the appearance may be directly sampled from the radiance field as E(S 0 ,r d , ⁇ S(S 0 ); ⁇ , ⁇ ).
- M U 1 ⁇ as
- L R 1 ⁇ U ⁇ ⁇ ⁇ c ⁇ I U f ⁇ ⁇ E ⁇ ( x , r d , n ; ⁇ , ⁇ ) - c ⁇ ( 6 )
- c is an RGB value of a foreground pixel in a mini-batch.
- the S may be regularized by an eikonal constraint
- Random points x r may be uniformly sampled from a cube which encapsulates the object's bounding unit radius sphere.
- the coarse shape may be restricted by enforcing its projected pattern to fall within the boundaries of the object masks.
- the soft mask loss may be used for the pixels other than the foreground pixels and softness parameter % as
- L M 1 ⁇ ⁇ ⁇ U ⁇ ⁇ ⁇ m ⁇ M U / U f ⁇ BCE ⁇ ( sigmoid ⁇ ( - ⁇ ⁇ ⁇ S min ) , m ) ( 8 )
- BCE is the binary cross entropy
- S min arg min t S(r 0 +tr d ; ⁇ ) is the minimum S value along the entire ray approximated by dense sampling of t.
- the radiance field E may be regularized to avoid overfitting to training views.
- SIRENs have a remarkable regressive potential, which biases them to overfit the appearance to the training views.
- This power may be leveraged to allow for encoding of photorealistic surface details, but the behavior of the E may be restricted in the angular domain conditioned by r d to achieve favorable interpolation behavior.
- the angular behavior may be linearized using a smoothness term
- the loss may be optimized in mini-batches of 50,000 individual rays sampled uniformly across the entire training dataset.
- a large batch size and uniform ray distribution may be critical to prevent local overfitting of the SIREN, especially for the high-frequency function E.
- MLPs representing S and E may be implemented as SIRENs with 5 layers using 256 hidden units each. Additionally, Fourier features ⁇ sin(2k ⁇ r d ), cos(2k ⁇ r d )
- S may be initialized to a unit sphere of radius 0.5 by pretraining to a procedural shape.
- the object rays may be traced in a larger sphere of radius 1, but the smaller initial radius improves the initial fit as well as the consequent convergence rate.
- the modeling of the shape, the modeling of the appearance, the linearizing, and the optimizing discussed above may be combined to generate a neural model representing the 3D object.
- the loss may be optimized using the Adam solver with an initial learning rate of 10-4 decreased by a factor of 2 every 40,000 batches for the overall training length of 150,000 batches on a processor such as a single Nvidia GPU RTX 2080Ti.
- the training data may be a set of 2D images capturing the 3D object from multiple different angles at the same moment of time. The positions of the cameras capturing the 2D images are also known via calibration processes.
- the sphere tracer may not run at real-time rates for moderate to high image resolutions.
- the compactness of the surface-based representation may be useful to convert the neural model to a triangular mesh suitable for real-time computer graphics applications.
- unstructured lumigraph rendering which preserves view-dependent effects learned by a neural representation may be used.
- the marching cubes algorithm may be used to extract a high-resolution surface mesh from the SDF S voxelized at a resolution of 5123. Instead of extracting the zero-level set, offsetting the iso-surface of S by 0.5% of the object radius in the outside direction may optimize the resulting image quality.
- the optimized emissivity function E may be resampled to synthesize projective textures T i for N camera poses and corresponding projection matrices. The ability to resample the camera poses for efficient viewing space coverage may be advantageous. In some embodiments, the choice of N and camera distributions may be optimized.
- the extracted mesh may be rasterized using OpenGL and the vertex positions may be projected to each pixel.
- angles ⁇ 1 . . . N may be computed between the ray towards the current rendering camera and the rays towards each of the N projective texture map viewpoints.
- This formulation may satisfy the epipolar consistency by converging to an exclusive mapping by texture T j when ⁇ 0. Additionally, samples from occluded textures may be discarded by setting their w i to zero. Occlusions may be detected by a comparison between the pre-rendered depth associated with a texture and the distance between the mesh voxel and the texture viewpoint. The same technique may be commonly used in real-time graphics for shadow mapping.
- NLR-RAS real-time rasterized neural lumigraph renderer
- NLR-ST sphere-traced renderer
- Table 1 illustrates rendering time and representation size comparison for the DTU scan 65 at 1600 ⁇ 1200 pixel resolution.
- “Real-time” denotes frame rates of at least 60 fps.
- the capacity of SIREN allows for a smaller and faster model, which is evident by the model size.
- the implicit volumetric rendering may be costly. Only the explicit representations of Colmap and the NLR-RAS allow for truly real-time performance with framerates over 60 fps at HD resolution on commodity hardware.
- the initial dataset may include seven multiview captures showing a person performing facial expressions.
- a custom camera array may be used to capture the dataset.
- FIG. 5 shows a frontal view of the custom camera array in accordance with an embodiment of the invention.
- the camera array 500 may include six Back-Bone cameras 502 in the center of the array and 16 background cameras 504 placed around them.
- the Back-Bone cameras 502 and the background cameras 504 may include large circular lenses.
- the Back-Bone cameras 502 and the background cameras 504 may be GoPro HERO 7 Cameras.
- a subject may be 60 cm distance from the cameras 502 , 504 and the cameras 502 , 504 may cover approximately 100°.
- the Back-Bone cameras 502 may be modified GoPro cameras that can fit a standard C-Mount lens.
- the background cameras 504 may be unmodified Go-Pro cameras.
- the Back-Bone cameras 502 may have a narrower field-of-view (FoV) and are thus able to capture the subject in more detail.
- the Back-Bone cameras 502 may capture at 4k/30 fps in portrait orientation and at 1080p/60 fps in landscape orientation with the GoPro cameras.
- the camera shutter may be triggered with a remote connected wirelessly.
- the cameras 502 , 504 do not support a generator lock, so during capture they may be only loosely synchronized. Videos may be used for the dataset, even in the cases in which only a static frame is used.
- an ArUco marker may be flashed on a cellphone before each capture. The first frame may be detected that sees the marker in each video which allows the synchronization of the cameras 202 , 204 with an accuracy of 1 frame or better.
- the method is able to achieve state-of-the-art image reconstruction quality on-par with volumetric methods such as NeRF while allowing for efficient surface reconstruction utilized for real-time rendering.
- the disclosed method was compared to novel view synthesis techniques with various scene representations. Specifically, the method was compared to the traditional multi-view stereo of Colmap, the explicit volumetric representation of Neural Volumes (NV), the implicit volume representation of NeRF [38], and the implicit signed distance function of IDR.
- a multiple view stereo (MVS) dataset was used with 49 or 64 calibrated camera images along with object masks to measure the image reconstruction error metrics. Three views were held out for testing. The image quality was significantly better than that of IDR. This may be attributed to major separation to the unparalleled representation capacity of SIRENs.
- the shape reconstruction error was reported as Chamfer distance from the ground-truth provided in the dataset. Although the shape reconstruction may not be the goal, the error may be on par with other techniques, though worse than IDR which explicitly focuses on shape reconstruction. This may be a trade-off between the accuracy of view-dependent and high-frequency details in the image reconstruction on one hand, and the view consistency reflected in the geometry on the other one.
- the angular smoothness loss S may be specifically designed to avoid collapse of the emissivity function E for interpolated views.
- the efficiency was tested quantitatively by measuring the image reconstruction error on test views. There was a measurable quality drop when compared to the training views observed consistently for all of the methods. However, the interpolated views produced by the method maintain many of the favorable characteristics from the regression case.
- View-synthesis of human subjects may be particularly challenging due to the complex reflection properties of skin, eyes and hair, as well as a lack of high-quality multi-view data.
- the first challenge may be addressed with the high-capacity representation network and the latter with the dataset.
- the disclosed method achieves a bigger advantage for very high-resolution (3000 ⁇ 4000 px) detailed images. This may show that the traditional ReLU based networks used by IDR and NeRF have reached their capacity, while the explicit representations of Colmap and NV lack easy scaling.
- the performance of the method may be verified based on the choice of the representation and training procedure.
- a standard MLP with ReLU may not have the capacity to train a detailed representation.
- SIREN remedies this but may quickly overfit to the trained pixels. This may be resolved by adding the angular smoothness loss S that regularizes behavior in the angular domain, and then by increasing the batch size in order to achieve spatially uniform image quality. Additional Fourier Features or the ray direction may remove low frequency noise in E.
- Various embodiments of the disclosure include a neural rendering framework that optimizes an SDF-based implicit neural scene representation given a set of multi-view images.
- This framework may be unique in combining a representation network architecture using periodic activations with a sphere-tracing based neural renderer that estimates the shape and view-dependent appearance of the scene. Enabled by a novel loss function that is applied during training, the framework may achieve a very high image quality that is comparable with state-of-the-art novel view synthesis methods.
- the neural representation can be directly converted into a mesh with view-dependent textures that enable high-quality 3D image synthesis in real time using traditional graphics pipelines.
- emissive radiance functions may model a scene under fixed lighting conditions. Some embodiments may include dynamic lighting and shading. Further, similar to IDR, the disclosed method may benefit from annotated object masks. Automatic image segmentation may be used to alleviate the need for annotated object masks. Although the synthesized image quality of the discussed approach is competitive with the state of the art, the proxy shapes produced by the disclosed method may not quite as accurate as alternative approaches. While this may not be important for the novel view synthesis applications, other applications may benefit from estimating more accurate shapes. Some embodiments may include occasional visible seam artifacts caused by inaccuracies of the camera calibration. Similar to some other recent neural rendering pipelines, the disclosed neural rendering pipeline focuses on overfitting a neural representation on a single 3D scene.
- Some embodiments include learning shape spaces, or priors, for certain types of objects, such as faces. While several methods have explored related strategies using conditioning-by-concatenation, hypernetwork, or metalearning approaches using synthetic data, there is a lack of publicly available photorealistic multi-view image data. Although the inference time of the disclosed method is fast, the training time may still be slow. More computing resources may allow exploring dynamic video sequences.
- L M 1 ⁇ ⁇ ⁇ U ⁇ ⁇ ⁇ m ⁇ M U / U f ⁇ BCE ⁇ ( sigmoid ⁇ ( - ⁇ ⁇ S min ) , m ) ,
- Emerging neural rendering approaches may outperform traditional vision and graphics approaches.
- Traditional graphics pipelines still offer significant practical benefits, such as real-time rendering rates, over these neural approaches.
- Embodiments disclosed previously take a significant step towards closing this gap, which may be a critical aspect for making neural rendering practical.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Graphics (AREA)
- Computing Systems (AREA)
- Geometry (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Image Generation (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
Description
-
- defining a view: V∈ 4×4;
- defining a projection matrix: P∈ 4×4;
- solving for a ray origin: r0=(V−1·[0,0,1,0]T)x,y,z;
- solving for a ray direction: rd=v((P·V)−1·[ux,uy, 0,1]T), where (·)x,y,z are vector components and v(ω)=ωx,y,z/∥ωx,y,z∥ is vector normalization;
- minimizing |S(x,θ)| along each ray using iterative updates of a form: x0=r0 and xi+1=xi+S(xi)rd; and
- solving for a zero-set of rays converged to a foreground object for a step count n: S0={xn|S(xn)=0}.
E(x,r d ,n,θ,ϕ): 9→ 3,
R=Σ i=1 . . . k w i T i,
S(x;θ): 3→, (1)
where x∈ 3 is a location in 3D space and θ is a learnable parameter of the sinusoidal representation network.
E(x,r d ,n;θ,ϕ): 9→ 3 (2)
to represent RGB appearance using the additional learnable parameters ϕ.
r 0=(V −1·[0,0,1,0]T)x,y,z (3)
r d =v((P·V)−1 ·[u x ,u y,0,1]T) (4)
where (·)x,y,z are vector components and v(ω)=ωx,y,z/∥ωx,y,z∥ is vector normalization.
x 0 =r 0 ,x i+1 =x i +S(x i)r d. (5)
Finally, S0={xn|S(xn)=0} may be the zero-set of rays converged to a foreground object for the step count n=16. A small residual |S(xn)|<0.005 may be tolerated. Gradients may be retained in the last step rather than for all steps of the sphere tracer. Advantageously, this approach makes sphere tracing memory efficient. The appearance may be directly sampled from the radiance field as E(S0,rd,∇S(S0);θ,ϕ).
where c is an RGB value of a foreground pixel in a mini-batch. Both L1 and L2 work well but L1 may produce marginally sharper images.
to enforce its metric properties important for efficient sphere tracing. Random points xr may be uniformly sampled from a cube which encapsulates the object's bounding unit radius sphere.
where BCE is the binary cross entropy and Smin=arg mint S(r0+trd; θ) is the minimum S value along the entire ray approximated by dense sampling of t.
Note that such level of control is unique to SIREN and related architectures as they may be C∞ differentiable.
with weights wE=0.1, wM=100, and wS=0.01. The performance may not be very sensitive to the weight choices with the exception of wS where large values cause high-frequency artifacts in S.
R=Σ i=1 . . . k w i T i (11)
where weights wi are computed as
ŵ i=1/τi(1−τi/τk) (12)
w i =ŵ i/Σi=1 . . . k ŵ i. (13)
TABLE 1 | ||
Method | Render time [s] | Model size [MB] |
Colmap | Real-time | 30.39 |
IDR | 45 | 11.13 |
NV | 0.65 | 438.36 |
|
150 | 2.27 |
NLR-ST | 13 | 2.07 |
NLR-RAS | Real-time | 34.68 |
-
- Item 1: An image rendering method for generating view-dependent images for a display including a plurality of pixels, includes:
- providing, to a neural network, a plurality of 2D images of a 3D object, wherein the neural network includes a signed distance function including a sinusoidal representation network, wherein the signed distance function is represented by S(x; θ): 3→, where x∈ 3 is a location in 3D space and θ is a first learnable parameter of the sinusoidal representation network, and wherein the neural network further includes a spatially varying emission function E expressible as: E(x,rd,n,θ,ϕ): 9→ 3, where ϕ is a second learnable parameter of the sinusoidal representation network;
- modeling, using a shape renderer, a shape of the 3D object by obtaining a zero-level set of the signed distance function including sphere tracing the signed distance function by:
- defining a view: V∈ 4×4;
- defining a projection matrix: P∈ 4×4;
- solving for a ray origin: r0=(V−1·[0,0,1,0]T)x,y,z;
- solving for a ray direction: rd=v((P·V)−1·[ux,uy,0,1]T), where (·)x,y,z are vector components and v(ω)=ωx,y,z/∥ωx,y,z∥ is vector normalization;
- minimizing |S(x,θ)| along each ray using iterative updates of a form: x0=r0 and xi+1=xi+S(xi)rd; and
- solving for a zero-set of rays converged to a foreground object for a step count n: S0={xn|S(xn)=0};
- modeling, using an appearance renderer, an appearance of the 3D object using E for directions rd∈ 3 in a global coordinate system, wherein modeling the appearance of the object includes:
- conditioning E by a local normal direction n=∇xS(x) as computed by automatic differentiation;
- minimizing an image reconstruction error for the 3D object in foreground pixels of the display, wherein the image reconstruction error is represented by:
-
-
-
- where c is an RBG value of a foreground pixel of the display and U represents a portion of the pixels with RGB values IU and object masks MU;
- regularizing the signed distance function by an eikonal constraint represented by:
-
-
-
-
-
- where U represents a portion of the pixels with RGB values IU and object masks MU;
- restricting the coarse shape, using a soft mask loss defined for pixels other than the foreground pixels, by enforcing a projected pattern to fall within the boundaries of the object masks, wherein the soft mask loss is represented by:
-
-
-
-
-
- where BCE is the binary cross entropy and Smin=arg mint S(r0+trd; θ) is the minimum S value along the entire ray approximated by dense sampling of t; and
- regularizing the emissivity function to avoid overfitting to training views by linearizing the angular behavior using a smoothness term represented by:
-
-
-
- optimizing the first learnable parameter θ and the second learnable parameter ϕ as:
-
-
- where wE, wM, and wS are weights for their respective loss functions
- combining outputs from the modeling of the shape, the modeling of the appearance, the linerizing, and the optimizing steps to generate a neural model representing the 3D object;
- converting the neural model into a triangular mesh representing the 3D object; and
- rendering multiple view-dependent images representative of the 3D object using the triangular mesh.
- Item 2: The method of
Item 1, further including displaying the multiple view-dependent images to a user using a display. - Item 3: The method of
Item 1, wherein rendering multiple view-dependent images is based upon a location of a viewer's eyes. - Item 4: The method of Item 3, wherein rendering multiple view-dependent images may be based upon a location of a viewer's head.
- Item 5: The method of
Item 1, further including displaying the multiple view-dependent images on a light field display. - Item 6: The method of
Item 1, further including displaying the multiple view-dependent images on a head mounted display. - Item 7: The method of
Item 1, wherein rendering multiple view-dependent images includes: - rasterizing the triangular mesh;
- projecting vertex positions to each pixel;
- computing angles τ1 . . . N between a ray towards a rendering camera and rays towards each of N projective texture map viewpoints.
- Item 8: The method of
Item 1, further including applying unstructured lumigraph rendering to blend contributions from first k textures Ti, sorted by τi in ascending order to create a rendered image represented by:
R=Σ i=1 . . . k w i T i, - where weights wi are computed as ŵi=1/τi(1−τitrk) and ŵi/Σi=i . . . kŵi.
-
Claims (24)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/644,291 US11922562B2 (en) | 2020-12-14 | 2021-12-14 | Methods and systems for rendering view-dependent images using 2D images |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063125288P | 2020-12-14 | 2020-12-14 | |
US17/644,291 US11922562B2 (en) | 2020-12-14 | 2021-12-14 | Methods and systems for rendering view-dependent images using 2D images |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220189104A1 US20220189104A1 (en) | 2022-06-16 |
US11922562B2 true US11922562B2 (en) | 2024-03-05 |
Family
ID=81941839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/644,291 Active 2041-12-30 US11922562B2 (en) | 2020-12-14 | 2021-12-14 | Methods and systems for rendering view-dependent images using 2D images |
Country Status (2)
Country | Link |
---|---|
US (1) | US11922562B2 (en) |
WO (1) | WO2022133445A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230196658A1 (en) * | 2021-06-08 | 2023-06-22 | Fyusion, Inc. | Enclosed multi-view visual media representation |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220239844A1 (en) * | 2021-01-27 | 2022-07-28 | Facebook Technologies, Llc | Neural 3D Video Synthesis |
US12112427B2 (en) * | 2021-08-27 | 2024-10-08 | Snap Inc. | High-definition real-time view synthesis |
CN115035252B (en) * | 2022-06-20 | 2023-05-09 | 北京市燃气集团有限责任公司 | Three-dimensional reconstruction method and device for gas plant station based on nerve radiation field |
CN115330940B (en) * | 2022-08-09 | 2023-05-23 | 北京百度网讯科技有限公司 | Three-dimensional reconstruction method, device, equipment and medium |
US12051151B2 (en) * | 2022-12-28 | 2024-07-30 | De-Identification Ltd. | System and method for reconstruction of an animatable three-dimensional human head model from an image using an implicit representation network |
CN118037989A (en) * | 2023-12-26 | 2024-05-14 | 杭州图科智能信息科技有限公司 | Multi-view nerve implicit surface reconstruction method based on priori driving |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140009466A1 (en) | 2009-07-28 | 2014-01-09 | Technion R&D Foundation Ltd. | Photogrammetric texture mapping using casual images |
US20200342652A1 (en) | 2019-04-25 | 2020-10-29 | Lucid VR, Inc. | Generating Synthetic Image Data for Machine Learning |
US20200342656A1 (en) | 2019-04-24 | 2020-10-29 | Microsoft Technology Licensing, Llc | Efficient rendering of high-density meshes |
-
2021
- 2021-12-14 WO PCT/US2021/072920 patent/WO2022133445A1/en active Application Filing
- 2021-12-14 US US17/644,291 patent/US11922562B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140009466A1 (en) | 2009-07-28 | 2014-01-09 | Technion R&D Foundation Ltd. | Photogrammetric texture mapping using casual images |
US20200342656A1 (en) | 2019-04-24 | 2020-10-29 | Microsoft Technology Licensing, Llc | Efficient rendering of high-density meshes |
US20200342652A1 (en) | 2019-04-25 | 2020-10-29 | Lucid VR, Inc. | Generating Synthetic Image Data for Machine Learning |
Non-Patent Citations (9)
Title |
---|
Buehler, C. et al. "Unstructured Lumigraph Rendering" SIGGRAPH '01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques; Aug. 2001; pp. 425-432. |
International Patent Application No. PCT/US2021/072920, Search Report and Written Opinion dated Apr. 27, 2022, 12 pages. |
Liu, S. et al. "DIST: Rendering Deep Implicit Signed Distance Function with Differentiable Sphere Tracing"; arXiv:1911.13225v2; Jun. 2000; 11 pages. |
Loubet, G. et al. "Reparameterizing Discontinuous Integrands for Differentiable Rendering" ACM Transactions on Graphics; vol. 38; Issue 6; Dec. 2019; Article No. 228; pp. 1-14. |
Mildenhall, B. et al. "NeRF: Representing scenes as neural radiance fields for view synthesis" In Proc. ECCV, 2020; 25 pages. |
Park,J.J. et al. "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" Computer Vision Foundation; Jan. 23, 2019; pp. 165-172. |
Rainer, Gilles, et al. "Neural BTF compression and interpolation." Computer Graphics Forum. vol. 38. No. 2. 2019 (Year: 2019). * |
Sitzmann, V. et al. "Implicit Neural Representations with Periodic Activation Functions" 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada; Jun. 2000; 12 pages. |
Yariv, L. et al. "Multiview neural surface reconstruction by disentangling geometry and appearance" In Proc. NeurIPS, 2020; 11 pages. |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230196658A1 (en) * | 2021-06-08 | 2023-06-22 | Fyusion, Inc. | Enclosed multi-view visual media representation |
Also Published As
Publication number | Publication date |
---|---|
WO2022133445A1 (en) | 2022-06-23 |
US20220189104A1 (en) | 2022-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11922562B2 (en) | Methods and systems for rendering view-dependent images using 2D images | |
US10636206B2 (en) | Method and system for generating an image file of a 3D garment model on a 3D body model | |
US11210838B2 (en) | Fusing, texturing, and rendering views of dynamic three-dimensional models | |
Huang et al. | 6-DOF VR videos with a single 360-camera | |
US10867453B2 (en) | Method and system for generating an image file of a 3D garment model on a 3D body model | |
JP7403528B2 (en) | Method and system for reconstructing color and depth information of a scene | |
JP7448566B2 (en) | Scalable 3D object recognition in cross-reality systems | |
Newcombe et al. | Live dense reconstruction with a single moving camera | |
CN108475327A (en) | three-dimensional acquisition and rendering | |
KR101560508B1 (en) | Method and arrangement for 3-dimensional image model adaptation | |
CN113421328B (en) | Three-dimensional human body virtual reconstruction method and device | |
CN110648274B (en) | Method and device for generating fisheye image | |
WO2013056188A1 (en) | Generating free viewpoint video using stereo imaging | |
JP2021056679A (en) | Image processing apparatus, method and program | |
US20240119671A1 (en) | Systems and methods for face asset creation and models from one or more images | |
JP2024510230A (en) | Multi-view neural human prediction using implicitly differentiable renderer for facial expression, body pose shape and clothing performance capture | |
WO2020184174A1 (en) | Image processing device and image processing method | |
CN115953476A (en) | Human body free visual angle synthesis method based on generalizable nerve radiation field | |
Starck et al. | Virtual view synthesis of people from multiple view video sequences | |
US20220222842A1 (en) | Image reconstruction for virtual 3d | |
Liu et al. | Creating simplified 3D models with high quality textures | |
Mulligan et al. | Stereo-based environment scanning for immersive telepresence | |
Guo et al. | Real-Time Free Viewpoint Video Synthesis System Based on DIBR and A Depth Estimation Network | |
CN116310228A (en) | Surface reconstruction and new view synthesis method for remote sensing scene | |
US11636578B1 (en) | Partial image completion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: RAXIUM, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WETZSTEIN, GORDON;JONES, ANDREW VICTOR;KELLNHOFER, PETR;AND OTHERS;SIGNING DATES FROM 20220126 TO 20220128;REEL/FRAME:058915/0366 |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RAXIUM INC.;REEL/FRAME:061448/0903 Effective date: 20220303 |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE EXECUTION DATE FROM 3/3/2022 TO 5/4/2022 PREVIOUSLY RECORDED ON REEL 061448 FRAME 0903. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RAXIUM INC.;REEL/FRAME:063149/0640 Effective date: 20220504 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |