WO2023244488A1 - Procédés et systèmes de synthèse de vue avec ré-éclairage d'image - Google Patents

Procédés et systèmes de synthèse de vue avec ré-éclairage d'image Download PDF

Info

Publication number
WO2023244488A1
WO2023244488A1 PCT/US2023/024796 US2023024796W WO2023244488A1 WO 2023244488 A1 WO2023244488 A1 WO 2023244488A1 US 2023024796 W US2023024796 W US 2023024796W WO 2023244488 A1 WO2023244488 A1 WO 2023244488A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
neural network
color
images
light direction
Prior art date
Application number
PCT/US2023/024796
Other languages
English (en)
Inventor
Zhong Li
Liangchen SONG
Yi Xu
Original Assignee
Innopeak Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innopeak Technology, Inc. filed Critical Innopeak Technology, Inc.
Publication of WO2023244488A1 publication Critical patent/WO2023244488A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • G06T15/506Illumination models

Definitions

  • NEURAL 4D LIGHT FIELD filed lune 14, 2022, which is commonly owned and incorporated by reference herein for all purposes.
  • the present invention is directed to image processing methods and techniques.
  • the present invention is directed to image processing methods and techniques.
  • a plurality of images characterizing a three-dimensional (3D) scene is obtained.
  • the plurality of images is used to determine a set of coordinates associated with a ray projecting into the 3D scene.
  • the set of coordinates serves as an input to train a neural network to predict RGB radiance for rendering the 3D scene from different viewpoints with different light conditions via a machine learning process.
  • One or more losses are calculated to refine the neural network.
  • Embodiments of the present invention can be implemented in conjunction with existing systems and processes.
  • the image processing system according to the present invention can be used in a wide variety of systems, including mobile devices, communication systems, and the like.
  • various techniques according to the present invention can be adopted into existing systems via training of the neural network(s), which is compatible with most image processing applications. There are other benefits as well.
  • a system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions.
  • One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
  • One general aspect includes a method for image processing. The method includes obtaining a plurality of images characterizing a three-dimensional (3D) scene, the plurality of images being characterized by a plurality of camera views and a plurality of light directions. The method also includes determining a first camera view and a first light direction associated with the 3D scene using the plurality of images, the first light direction being associated with a first pixel.
  • the method also includes calculating a set of coordinates using the first camera view and the first light direction.
  • the method also includes generating a first normal, a first albedo, and a first roughness using the set of coordinates and a first neural network.
  • the method also includes generating a first color of the first pixel using the first normal, the first albedo, the first roughness, the first light direction, and a second neural network.
  • the method also includes providing an output image associated with the 3D scene using at least the first color.
  • Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Implementations may include one or more of the following features.
  • the method may include calculating a first loss using the first color and a ground-truth color of the first pixel.
  • the method may include updating the first neural network using the first loss.
  • the method may include calculating a second loss using the first color, the first normal, the first albedo, and the first roughness.
  • the method may include updating the second neural network using the second loss.
  • the first camera view may be different from any of the plurality of camera views.
  • the first light direction may be different from any of the plurality of light directions.
  • the set of coordinates may include a four-dimensional (4D) coordinate.
  • One general aspect includes a system for image processing.
  • the system also includes a camera module configured to capture a plurality of images characterizing a three-dimensional (3D) scene, the camera module may include one or more cameras and one or more light sources.
  • the system also includes a storage configured to store a plurality of images, the plurality of images being characterized by a plurality of camera views and a plurality of light directions.
  • the system also includes a processor coupled to the storage, the processor being configured to: retrieve the plurality of images from the storage; determine a first camera view and a first light direction associated with the 3D scene using the plurality of images, the first light direction being associated with a first pixel; calculate a set of coordinates using the first camera view and the first light direction; generate a first normal, a first albedo, and a first roughness using the set of coordinates and a first neural network; generate a first color of the first pixel using the first normal, the first albedo, the first roughness, the first light direction, and a second neural network; and provide an output image associated with the 3d scene using at least the first color.
  • Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Implementations may include one or more of the following features.
  • the system may include a display configured to display the output image.
  • the plurality of images is captured by a plurality of cameras under a plurality of directional lights.
  • the processor is further configured to: calculate a first loss using the first color and a ground-truth color of the first pixel and update the first neural network using the first loss.
  • the processor is further configured to: calculate a second loss using the first color, the first normal, the first albedo, and the first roughness; and update the second neural network using the second loss.
  • Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
  • One general aspect includes a method for image processing.
  • the method also includes obtaining a first image associated with a three-dimensional (3D) scene, the first image being characterized by a first camera view and a first light direction.
  • the method also includes determining a second camera view and a second light direction associated with the 3d scene, the second camera view being different from the first camera view, the second light direction being different from the first light direction.
  • the method also includes calculating a set of coordinates using the second camera view and the second light direction.
  • the method also includes generating a first normal, a first albedo, and a first roughness using the set of coordinates and a first neural network.
  • the method also includes generating a first color using the first normal, the first albedo, the first roughness, the first light direction, and a second neural network, the first color being associated with the second light direction.
  • the method also includes providing an output image associated with the 3D scene using at least the first color.
  • Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Implementations may include one or more of the following features.
  • the 3D scene may include a first object characterized by a first surface property.
  • the first normal, the first albedo, and the first roughness may be associated with the first surface property.
  • the method may include decomposing the first surface property.
  • the set of coordinates may include a fourdimensional (4D) coordinate.
  • embodiments of the present invention provide many advantages over conventional techniques.
  • the present systems and methods for image processing can generate photorealistic images under arbitrary changes in viewpoints and lighting conditions based on limited image inputs (e.g., sparse viewpoints and limited light sources), allowing for enhanced efficiency and reduced memory footprint.
  • the system trained with one or more losses — can recover the spatially-varying bidirectional reflectance distribution function (SVBRDF) parameters in a weakly- supervised manner, resulting in visually rich representations that enable immersive and interactive virtual experience.
  • SVBRDF spatially-varying bidirectional reflectance distribution function
  • Figure 1 is a simplified block diagram illustrating a system for image processing according to embodiments of the present invention.
  • Figure 2 is a simplified diagram illustrating a camera module of a system for image processing according to embodiments of the present invention.
  • Figure 3 is a simplified diagram illustrating a representation of light field according to embodiments of the present invention.
  • Figure 4 is a simplified diagram illustrating a data flow for image processing according to embodiments of the present invention.
  • Figure 5 is a simplified flow diagram illustrating a method for image processing according to embodiments of the present invention.
  • the present invention is directed to image processing methods and techniques.
  • a plurality of images characterizing a three-dimensional (3D) scene is obtained.
  • the plurality of images is used to determine a set of coordinates associated with a ray projecting into the 3D scene.
  • the set of coordinates serves as an input to train a neural network to predict RGB radiance for rendering the 3D scene from different viewpoints with different light conditions via a machine learning process.
  • One or more losses are calculated to refine the neural network.
  • a general aspect of the present invention is to come up with a new solution to generate high-fidelity free view synthesis results with arbitrary lighting.
  • the present invention provides methods and systems that use a limited number of input images to realize photorealistic visual representations with fast rendering speed and low memory cost via a deep learning process.
  • any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6.
  • the use of “step of’ or “act of’ in the Claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.
  • FIG. 1 is a simplified block diagram illustrating a system 100 for image processing according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications.
  • system 100 may include a camera module 110 (or other image or video capturing device), a storage 120, and a processor 130.
  • camera module 110 is configured to capture a plurality of images characterizing a three-dimensional (3D) scene and includes one or more cameras and one or more light sources.
  • the 3D scene may include an object characterized by a first surface property.
  • the one or more light sources may be positioned at various locations around a 3D scene and configured to illuminate the scene to provide various lighting conditions. For instance, the one or more light sources are focused toward an object contained in the scene and provide directional lights to illuminate the object from various angles.
  • the one or more light sources may include, without limitation, one or more studio light(s), one or more flash unit(s), one or more LED panel(s), and/or the like.
  • the direction, intensity, and color temperature of each light source may be adjustable to create different lighting scenarios.
  • light modifiers e.g., softboxes, umbrellas, reflectors, etc.
  • the one or more cameras are positioned at various locations around the 3D scene and configured to capture images of the scene from multiple viewpoints.
  • the one or more cameras are configured to capture a video clip including a sequence of consecutive image frames depicting the scene from various viewpoints.
  • the one or more cameras may include, without limitation, one or more RGB camera(s), one or more Digital Single-Lens Reflex (DSLR) camera(s), one or more mirrorless camera(s), one or more High Dynamic Range (HDR) camera(s), one or more image sensor(s), one or more video recorder(s), and/or the like.
  • camera module 110 may be configured in various arrangements to collect image samples characterized by a wide range of perspectives and lighting conditions.
  • the one or more cameras and light sources are placed evenly around the object or scene, at a fixed distance and elevation. This arrangement captures images from various angles in a horizontal plane, offering 360-degree coverage. In other examples, the one or more cameras and light sources are placed at various elevations and azimuth angles around the object or scene (e.g., in a spherical arrangement), creating images with multiple perspectives under different lighting conditions.
  • the plurality of images captured by camera module 110 is characterized by a plurality of camera views and a plurality of light directions.
  • storage 120 is configured to store the plurality of images captured by camera module 110.
  • Storage 120 may include, without limitation, local and/or network-accessible storage, a disk drive, a drive array, an optical storage device, and a solid-state storage device, which can be programmable, flash-updateable, and/or the like.
  • Processor 130 can be coupled to each of the previously mentioned components and be configured to communicate between these components.
  • processor 130 includes a central processing unit (CPU) 132, graphics processing unit (GPU) 134, and/or network processing unit (NPU) 136, or the like.
  • each of the processing units may include one or more processing cores for parallel processing.
  • CPU 132 includes both high-performance cores and energy-efficient cores.
  • Processor 130 is configured to process the plurality of images to generate a light field representation of the 3D scene and train a neural network to predict RGB radiance for efficient relighting and free view synthesis, as will be described in further detail below.
  • the system 100 can also include a network interface 140 and a display 150.
  • Display 150 is configured to display an output image generated by processor 130.
  • the output image may be associated with the same 3D scene and characterized by a camera view and a light direction that are different from any of the plurality of input images.
  • Network interface 140 can be configured to transmit and receive images (e.g., using Wi-Fi, Bluetooth, Ethernet, etc.) for neural network training and/or image processing.
  • the network interface 140 can also be configured to compress or down-sample images for transmission or further processing.
  • Network interface 140 can also be configured to send one or more images to a server for postprocessing.
  • the processor 130 can also be coupled to and configured to communicate between display 150, the network interface 140, and any other interfaces.
  • system 100 further includes one or more peripheral devices 160 configured to improve user interaction in various aspects.
  • peripheral devices 160 may include, without limitation, at least one of the speaker(s) or earpiece(s), audio sensor(s) or microphone(s), noise sensors, keyboard, mouse, and/or other input/output devices.
  • processor 130 can be configured to retrieve the plurality of images from storage 120; to determine a first camera view and a first light direction associated with the 3D scene using the plurality of images; to calculate a set of coordinates using the first camera view and the first light direction; to generate a first normal, a first albedo, and a first roughness using the set of coordinates and a first neural network; to generate a first color of a first pixel using the first normal, the first albedo, the first roughness, the first light direction, and a second neural network; and to provide an output image associated with the 3D scene using at least the first color.
  • GPU 134 is coupled to display 150 and camera module 110.
  • GPU 134 may be configured to transmit output images to display 150.
  • NPU 136 may be used to train one or more neural networks for image relighting and view synthesis with one or more losses. For instance, NPU 136 is configured to train the neural network(s) by minimizing a render loss between SVBRDF rendering result and the ground truth color via decomposing the first surface property of the object contained in the 3D scene.
  • FIG. 2 is a simplified diagram illustrating a camera module 200 of a system for image processing according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the ail would recognize many variations, alternatives, and modifications.
  • Camera module 200 may include one or more cameras and one or more light sources.
  • camera module 200 includes a first camera 202a, a second camera 202b, a third camera 202c, a first light source 204a, a second light source 204b, and a third light source 204c.
  • the cameras (202a, 202b, 202c) and light sources (204a, 204b, 204c) may be arranged around a scene including one or more subjects (e.g., a person 206) and are configured to capture images of the scene from various viewpoints under different light configurations.
  • the images captured by camera module 200 may be a video clip including a sequence of images from different viewpoints.
  • each camera is configured to capture an image of the scene at each light position, providing a plurality of images characterized by a plurality of camera views and a plurality of light directions. For instance, given N viewpoints and L light sources, each viewpoint is illuminated by L light sources, resulting in a total of N X L images. It is to be appreciated that the number of cameras and light sources is not limited to what is shown in Figure 2, and a different number of cameras and light sources may be employed in other embodiments.
  • camera module 200 may include a lab-controlled device that consists of a 3D structure (e.g., spherical, cuboid, ellipsoid, cylinder, and/or the like) fitted with an array of light sources and cameras configured to surround the subject/scene. This configuration allows for precise control over lighting conditions and camera calibrations, resulting in high-quality images that are later used for training neural network(s).
  • the one or more cameras may be synchronized to capture the images from different viewpoints simultaneously. In other embodiments, the one or more cameras may be unsynchronized to capture the plurality of images sequentially.
  • the subject of the scene e.g., person 206) may remain stationary during the image capture process.
  • Figure 3 is a simplified diagram illustrating a representation of light field 300 according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications.
  • a light field representation may be used.
  • the light field representation is configured to store the light ray’s properties (e.g., color, intensity, direction, and/or the like) at each point in the 3D space as a function of position and direction.
  • a 3D scene 308 can be represented as a 4D light field using two-plane parameterization as shown in Figure 3.
  • the 4D light field parameterizes a light ray R from a camera viewpoint 302 with a known camera pose intersecting with two planes: a uv plane 304 and a st plane 306. Each point on the st plane is connected to a corresponding point on the uv plane.
  • light ray R intersects with uv plane 304 and st plane 306 at point 310 and point 312, respectively.
  • Point 310 on uv plane 304 having a coordinate ui,vi) is connected to point 312 having a coordinate (si,ti) on st plane 306.
  • An oriented line indicating the direction of light ray R can thus be defined by connecting point 310 and point 312 and parameterized by a 4D coordinate (ui, vi, si, ti).
  • light field data can be processed and manipulated using machine learning algorithms to construct new views with arbitrary light direction, as will be described in further detail below.
  • FIG. 4 is a simplified diagram illustrating a data flow 400 for image processing according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims.
  • One of ordinary skill in the art would recognize many variations, alternatives, and modifications. For example, one or more steps may be added, removed, repeated, rearranged, replaced, modified, and/or overlapped, and they should limit the scope of the claims.
  • the present invention provides a method to render images from novel viewpoints — viewpoints that are different from the input images — with arbitrary lighting conditions.
  • the method may include two stages implemented by two neural networks: (1) determining a light ray’s SVBRDF components (e.g., normal, albedo, and roughness), which can be used to model the appearance of materials and reflectance properties; (2) training a neural network to render images from novel viewpoints using at least the SVBRDF components.
  • SVBRDF components e.g., normal, albedo, and roughness
  • flow 400 implements a two-stage network architecture to provide free viewpoint rendering results based on sparse camera views and limited light sources.
  • the two-stage network receives input 4D coordinates 402 as inputs, which can be obtained by extracting 4D ray parameterizations of pixels in a 3D scene.
  • the 3D scene may include an object characterized by a first surface property.
  • a first camera view and a first light direction e.g., camera viewpoint 302 and light ray R of Figure 3
  • the first light direction may be associated with a first pixel in the 3D scene.
  • input 4D coordinates 402 may be fed to a first neural network 404 (which may also be referred to as “DecomposeNet”) to generate SVBRDF parameters.
  • SVBRDF Spatially- Varying Bidirectional Reflectance Distribution Function
  • BRDF uniform Bidirectional Reflectance Distribution Function
  • an SVBRDF accounts for the fact that real-world surfaces have spatial variations in their reflectance properties due to imperfections like bumps, scratches, and other irregularities that impact light interaction.
  • SVBRDF parameters are a set of values that define the reflectance properties of a surface at each point and can be used to model the appearance of real-world objects with complex surface reflectance properties (e.g., a human face).
  • the first neural network 404 generates the SVBRDF parameters including a first normal 412, a first albedo 414, and a first roughness 416 via normal branch 406, albedo branch 408, and roughness branch 410, respectively.
  • a normal parameter defines the direction of a surface normal, which is a vector perpendicular to the surface at a given point. The normal parameter may be used to determine the direction in which light is reflected off the surface.
  • An albedo parameter may include a diffuse albedo and/or a specular albedo.
  • the diffuse albedo is configured to quantify the amount of light that is diffusely reflected from a surface.
  • the specular albedo is configured to measure the amount of light that is specularly reflected from a surface (i.e., in a mirror-like manner).
  • a roughness parameter is used to determine the microsurface irregularities of a material, which affects the way light is scattered on a surface.
  • the first neural network 404 includes a multilayer perceptron (MLP) comprising a series of layers of interconnected neurons. The output of each layer may be fed into the next layer to perform various classification and regression tasks.
  • MLP multilayer perceptron
  • the MLP network first extracts a shared feature among SVBRDF parameters and then employs three decoders (e.g., normal branch 406, albedo branch 408, and roughness branch 410) to generate SVBRDF parameters (e.g., first normal 412, first albedo 414, and first roughness 416).
  • the first neural network 404 (DecomposeNet) configured to predict SVBRDF parameters can be represented as:
  • the SVBRDF parameters (e.g., first normal 412, first albedo 414, and first roughness 416) are fed into a second neural network 420 (which may also be referred to as “RenderNet”) as inputs.
  • Second neural network 420 may be configured to generate a first color of the fist pixel using the first light direction, first normal 412, first albedo 414, first roughness 416.
  • Second neural network 420 may include a multilayer perceptron (MLP) comprising a series of layers of interconnected neurons. The output of each layer may be fed into the next layer to perform various classification and regression tasks.
  • MLP multilayer perceptron
  • second neural network 420 is trained to utilize first normal 412, first albedo 414, first roughness 416, and light direction 422 to generate the ray color 424 via an implicit rendering process.
  • second neural network 420 is used to learn an implicit function RenderNetQ that defines the surface of an object in the 3D scene.
  • RenderNetQ that defines the surface of an object in the 3D scene.
  • One or more losses may be used to train first neural network 404 and second neural network 420.
  • a predicted color Cpred e.g., ray color 424
  • second neural network 420 may be calculated as follows:
  • C P red RenderNet(DecomposeNet(r ⁇ ®d), r, l ⁇ Qd) (Eqn. 3)
  • a first loss 426 may include a photometric loss L p to minimize the multi- view photometric loss.
  • the photometric loss L p can be calculated using the predicted color of a pixel and its groundtruth color:
  • second neural network 420 may be trained using a second loss 428, which may include a microfacet Tenderer loss L m .
  • the microfacet Tenderer loss L m may be calculated using microfacet rendering results and the ground truth to train second neural network 420 in a weakly- supervised manner.
  • rendering layer 430 is configured to perform a microfacet rendering process to calculate the color and/or intensity of each pixel in the image (e.g., the first pixel) using first normal 412, first albedo 414, and first roughness 416.
  • the microfacet rendering process can model the reflection and refraction of light by taking into account the effects of surface roughness.
  • render layer 430 first calculates a distribution of microfacets on the surface of an object, which describes the probability of finding a microfacet with a particular orientation and size. With the known distribution of microfacets, render layer 430 can then calculate the reflection and refraction of light at the surface. The color of each pixel in the image can therefore be calculated using the microfacet distribution and the SVBRDF parameters (e.g., first normal 412, first albedo 414, and first roughness 416).
  • the second neural network 420 can be further refined by minimizing the microfacet Tenderer loss L m -. where M is microfacet BRDF rendering model.
  • FIG. 5 is a simplified flow diagram illustrating a method 500 for image processing according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims.
  • One of ordinary skill in the art would recognize many variations, alternatives, and modifications. For example, one or more steps may be added, removed, repeated, rearranged, replaced, modified, and/or overlapped, and they should limit the scope of the claims.
  • method 500 may be performed by a computing system, such as system 100 shown in Figure 1.
  • method 500 includes step 502 of obtaining a plurality of images characterizing a three-dimensional (3D) scene.
  • the 3D scene may include a first object characterized by a first surface property.
  • the plurality of images may be captured by a camera module — such as camera module 200 shown in Figure 2 — which includes one or more cameras and one or more light sources.
  • the plurality of images may be characterized by a plurality of camera views and a plurality of light directions.
  • the plurality of images may be used as training inputs to train one or more deep learning models to generate one or more images characterized by viewpoints and lighting conditions that are different from the input images.
  • the plurality of images can provide ground-truth values (e.g., the colors of pixels) to calculate one or more loss functions to further refine the deep learning models for improved performance.
  • method 500 includes determining a first camera view and a first light direction associated with the 3D scene using the plurality of images.
  • the first camera view may be different from any of the plurality of camera views.
  • the first light direction may be different from any of the plurality of light directions.
  • the first light direction may be associated with a first pixel in the 3D scene.
  • method 500 includes calculating a set of coordinates using the first camera view and the first light direction.
  • the set of coordinates may be calculated using a light field representation as shown in Figure 3.
  • the light field representation calculates the intersection points of the light ray and two planes in 3D space.
  • the set of coordinates may include a fourdimensional (4D) coordinate, which includes the coordinates of the two intersection points on the two planes.
  • method 500 includes generating a first normal, a first albedo, and a first roughness using the set of coordinates and a first neural network.
  • the first neural network may include a fully connected network.
  • the first neural network includes a multilayer perceptron (MLP) comprising a series of layers of interconnected neurons.
  • MLP multilayer perceptron
  • the first neural network takes the set of coordinates as input to generate one or more SVBRDF parameters including the first normal, the first albedo, and the first roughness, which can later be used to render surfaces with complex reflectance properties.
  • the first neural network may be trained to generate SVBRDF parameters for a variety of surfaces (e.g., metals, plastic, cloth, and/or the like) and can be further refined using one or more loss functions.
  • method 500 may further include decomposing the first surface property to generate a set of SVBRDF parameters (e.g., the first normal, the first albedo, and the first roughness) that describes the surface reflectance property at each
  • method 500 includes generating a first color of the first pixel using the first normal, the first albedo, the first roughness, the first light direction, and a second neural network.
  • the second neural network may include a fully connected network.
  • method 500 includes providing an output image associated with the 3D scene using at least the first color.
  • the second neural network may include a multilayer perceptron (MLP) comprising a series of layers of interconnected neurons.
  • MLP multilayer perceptron
  • the second neural network may be trained with the SVBRDF parameters (e.g., the first normal, the first albedo, the first roughness) to render an image of the 3D scene from a novel viewpoint via an implicit rendering process by calculating the first color of the first pixel.
  • the first normal, the first albedo, and the first roughness are associated with the first surface property of the object contained in the 3D scene.
  • the output image may be characterized by a camera view that is different from any of the plurality of camera views.
  • the output image may also be characterized by a light direction that is different from any of the plurality of light directions.
  • method 500 may include calculating a first loss using the first color and a ground-truth color of the first pixel.
  • the first loss may be used to update the first neural network for improved performance.
  • the first loss includes a photometric loss that measures the difference between the predicted color and the ground-truth color.
  • the parameters of the first neural network may be updated by minimizing the photometric loss.
  • method 500 may further include calculating a second loss using the first color, the first normal, the first albedo, and the first roughness. The second loss may be used to update the second neural network.
  • the second loss may include a microfacet Tenderer loss, which is calculated using the SVBRDF parameters generated by the first neural network via a microfacet rendering process.
  • the second neural network may be further refined by enforcing the output color (e.g., the first color of the first pixel) to be close to the microfacet rendering result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Geometry (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente invention concerne des procédés et des techniques de traitement d'image. Selon un mode de réalisation spécifique, une pluralité d'images caractérisant une scène tridimensionnelle (3D) sont obtenues. La pluralité d'images sont utilisées pour déterminer un ensemble de coordonnées associées à un rayon projeté dans la scène 3D. L'ensemble de coordonnées sert d'entrée pour apprendre à un réseau neuronal à prédire la radiance RVB pour le rendu de la scène 3D à partir de différents points de vue et dans différentes conditions d'éclairage, grâce à un processus d'apprentissage machine. Une ou plusieurs pertes sont calculées pour affiner le réseau neuronal. D'autres modes de réalisation existent également.
PCT/US2023/024796 2022-06-14 2023-06-08 Procédés et systèmes de synthèse de vue avec ré-éclairage d'image WO2023244488A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263351832P 2022-06-14 2022-06-14
US63/351,832 2022-06-14

Publications (1)

Publication Number Publication Date
WO2023244488A1 true WO2023244488A1 (fr) 2023-12-21

Family

ID=89191778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/024796 WO2023244488A1 (fr) 2022-06-14 2023-06-08 Procédés et systèmes de synthèse de vue avec ré-éclairage d'image

Country Status (1)

Country Link
WO (1) WO2023244488A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180204314A1 (en) * 2017-01-18 2018-07-19 Nvidia Corporation Filtering image data using a neural network
US20210295592A1 (en) * 2015-11-30 2021-09-23 Photopotech LLC Methods for Collecting and Processing Image Information to Produce Digital Assets
WO2022098358A1 (fr) * 2020-11-05 2022-05-12 Google Llc Capture de performance volumétrique à l'aide d'un rendu neuronal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210295592A1 (en) * 2015-11-30 2021-09-23 Photopotech LLC Methods for Collecting and Processing Image Information to Produce Digital Assets
US20180204314A1 (en) * 2017-01-18 2018-07-19 Nvidia Corporation Filtering image data using a neural network
WO2022098358A1 (fr) * 2020-11-05 2022-05-12 Google Llc Capture de performance volumétrique à l'aide d'un rendu neuronal

Similar Documents

Publication Publication Date Title
US11210838B2 (en) Fusing, texturing, and rendering views of dynamic three-dimensional models
CN107993216B (zh) 一种图像融合方法及其设备、存储介质、终端
US11386633B2 (en) Image augmentation for analytics
WO2019237299A1 (fr) Modification et capture de visage 3d à l'aide de réseaux neuronaux de suivi temporel et d'une image
US11710287B2 (en) Generative latent textured proxies for object category modeling
CN108475327A (zh) 三维采集与渲染
JP2016537901A (ja) ライトフィールド処理方法
US20230154101A1 (en) Techniques for multi-view neural object modeling
JP2023521270A (ja) 多様なポートレートから照明を学習すること
US9208606B2 (en) System, method, and computer program product for extruding a model through a two-dimensional scene
US11451758B1 (en) Systems, methods, and media for colorizing grayscale images
WO2022098358A1 (fr) Capture de performance volumétrique à l'aide d'un rendu neuronal
Sevastopolsky et al. Relightable 3d head portraits from a smartphone video
Kang et al. View-dependent scene appearance synthesis using inverse rendering from light fields
CN116109974A (zh) 体积视频展示方法以及相关设备
WO2023244488A1 (fr) Procédés et systèmes de synthèse de vue avec ré-éclairage d'image
RU2757563C1 (ru) Способ визуализации 3d портрета человека с измененным освещением и вычислительное устройство для него
US20240212106A1 (en) Photo Relighting and Background Replacement Based on Machine Learning Models
US20240020901A1 (en) Method and application for animating computer generated images
CN116193093A (zh) 视频制作方法、装置、电子设备及可读存储介质
Mihut et al. Lighting and Shadow Techniques for Realistic 3D Synthetic Object Compositing in Images
CN116311187A (zh) 对象材质识别方法、装置、电子设备及存储介质
CN114631127A (zh) 说话头的小样本合成
CN117557714A (zh) 三维重建方法、电子设备及可读存储介质
CN116187424A (zh) 一种基于样本训练的活体检测模型训练与使用方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23824440

Country of ref document: EP

Kind code of ref document: A1