NZ794397B2 - Techniques for multi-view neural object modeling - Google Patents

Techniques for multi-view neural object modeling

Info

Publication number
NZ794397B2
NZ794397B2 NZ794397A NZ79439722A NZ794397B2 NZ 794397 B2 NZ794397 B2 NZ 794397B2 NZ 794397 A NZ794397 A NZ 794397A NZ 79439722 A NZ79439722 A NZ 79439722A NZ 794397 B2 NZ794397 B2 NZ 794397B2
Authority
NZ
New Zealand
Prior art keywords
color
code
computer
images
machine learning
Prior art date
Application number
NZ794397A
Other versions
NZ794397A (en
Inventor
Wang Daoye
Edward Bradley Derek
Zoss Gaspard
Fabiano Urnau Gotardo Paulo
Chandran Prashanth
Original Assignee
Disney Enterprises Inc
ETH Zürich (Eidgenössische Technische Hochschule Zürich)
Filing date
Publication date
Priority claimed from US17/983,246 external-priority patent/US12236517B2/en
Application filed by Disney Enterprises Inc, ETH Zürich (Eidgenössische Technische Hochschule Zürich) filed Critical Disney Enterprises Inc
Publication of NZ794397A publication Critical patent/NZ794397A/en
Publication of NZ794397B2 publication Critical patent/NZ794397B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/08Volume rendering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume

Abstract

Techniques are disclosed for generating photorealistic images of objects, such as heads, from multiple viewpoints. In some embodiments, a morphable radiance field (MoRF) model that generates images of heads includes an identity model that maps an identifier (ID) code associated with a head into two codes: a deformation ID code encoding a geometric deformation from a canonical head geometry, and a canonical ID code encoding a canonical appearance within a shape-normalized space. The MoRF model also includes a deformation field model that maps a world space position to a shape-normalized space position based on the deformation ID code. Further, the MoRF model includes a canonical neural radiance field (NeRF) model that includes a density multi-layer perceptron (MLP) branch, a diffuse MLP branch, and a specular MLP branch that output densities, diffuse colors, and specular colors, respectively. The MoRF model can be used to render images of heads from various viewpoints.

Claims (18)

WHAT IS CLAIMED IS:
1. A computer-implemented method for rendering an image of an object, the method comprising: tracing a ray through a pixel into a virtual scene; sampling one or more positions along the ray; applying a machine learning model to the one or more positions and an identifier (ID) code associated with an object to determine, for each position included in the one or more positions, a density, a diffuse color, and a specular color; and computing a color of the pixel based on the density, the diffuse color, and the specular color corresponding to each position included in the one or more positions; wherein the machine learning model comprises an identity model that maps the ID code to (i) a deformation ID code that encodes a geometric deformation from a canonical object geometry, and (ii) a canonical ID code that encodes an appearance within a space associated with the canonical object geometry.
2. The computer-implemented method of claim 1, wherein the machine learning model comprises a neural radiance field (NeRF) model that comprises a multi-layer perceptron (MLP) trunk, a first MLP branch that computes densities, a second MLP branch that computes diffuse colors, and a third MLP branch that computes specular colors.
3. The computer-implemented method of any proceeding claim, wherein computing the color of the pixel comprises: averaging the diffuse color corresponding to each position included in the one or more positions based on the density corresponding to the position to determine an averaged diffuse color; averaging the specular color corresponding to each position included in the one or more positions based on the density corresponding to the position to determine an averaged specular color; and 1005430296 computing the color of the pixel based on the averaged diffuse color and the averaged specular color.
4. The computer-implemented method of any proceeding claim, further comprising training the machine learning model based on a set of images of one or more objects that are captured from a plurality of viewpoints.
5. The computer-implemented method of claim 4, wherein the set of images include a first set of images that include diffuse colors and specular information and a second set of images that include the diffuse colors.
6. The computer-implemented method of claim 4 or 5, wherein the machine learning model is further trained based on a generated set of images of the one or more objects from another plurality of viewpoints.
7. The computer-implemented method of any proceeding claim, further comprising fitting at least one of the ID code or the machine learning model to one or more images of another object.
8. The computer-implemented method of claim 7, further comprising fitting the at least one of the ID code or the machine learning model to geometry associated with the another object.
9. The computer-implemented method of any proceeding claim, wherein the object is a head.
10. One or more computer-readable storage media including instructions that, when executed by one or more processing units, cause the one or more processing units to perform steps for rendering an image of an object, the steps comprising: tracing a ray through a pixel into a virtual scene; sampling one or more positions along the ray; 1005430296 applying a machine learning model to the one or more positions and an identifier (ID) code associated with an object to determine, for each position included in the one or more positions, a density, a diffuse color, and a specular color; and computing a color of the pixel based on the density, the diffuse color, and the specular color corresponding to each position included in the one or more positions; wherein the machine learning model comprises an identity model that maps the ID code to (i) a deformation ID code that encodes a geometric deformation from a canonical object geometry, and (ii) a canonical ID code that encodes an appearance within a space associated with the canonical object geometry.
11. The one or more computer-readable storage media of claim 10, wherein the machine learning model comprises a neural radiance field (NeRF) model that comprises a multi-layer perceptron (MLP) trunk, a first MLP branch that computes densities, a second MLP branch that computes diffuse colors, and a third MLP branch that computes specular colors.
12. The one or more computer-readable storage media of any one of claims 10 or 11, wherein computing the color of the pixel comprises: averaging the diffuse color corresponding to each position included in the one or more positions based on the density corresponding to the position to determine an averaged diffuse color; averaging the specular color corresponding to each position included in the one or more positions based on the density corresponding to the position to determine an averaged specular color; and computing the color of the pixel based on the averaged diffuse color and the averaged specular color.
13. The one or more computer-readable storage media of any of claims 10 to 12, wherein the instructions, when executed by the one or more processing units, further 1005430296 cause the one or more processing units to perform the step of training the machine learning model based on a set of images of one or more object that are captured from a plurality of viewpoints.
14. The one or more computer-readable storage media of claim 13, wherein the set of images include a first set of images that include diffuse colors and specular information and a second set of images that include the diffuse colors.
15. The one or more computer-readable storage media of any of claims 10 to 14, wherein the instructions, when executed by the one or more processing units, further cause the one or more processing units to perform the step of fitting at least one of the ID code or the machine learning model to one or more images of another object.
16. The one or more computer-readable storage media of claim 15, wherein the instructions, when executed by the one or more processing units, further cause the one or more processing units to perform the step of fitting the at least one of the ID code or the machine learning model to geometry associated with the another object.
17. A computer-implemented method for training a machine learning model, the method comprising: receiving a first set of images of one or more object that are captured from a plurality of viewpoints; generating a second set of images of the one or more object from another plurality of viewpoints; and training, based on the first set of images and the second set of images, a machine learning model, wherein the machine learning model comprises a neural radiance field model and an identity model, and wherein the identity model maps an identifier (ID) code to (i) a deformation ID code that encodes a geometric deformation from a canonical object geometry, and (ii) a canonical ID code that encodes an appearance within a space associated with the canonical object geometry. 1005430296
18. The method of claim 17, wherein the training is based on at least one of a rendering loss, a deformation loss, a density loss, or an ID loss.
NZ794397A 2022-11-15 Techniques for multi-view neural object modeling NZ794397B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163280101P 2021-11-16 2021-11-16
US17/983,246 US12236517B2 (en) 2021-11-16 2022-11-08 Techniques for multi-view neural object modeling

Publications (2)

Publication Number Publication Date
NZ794397A NZ794397A (en) 2025-05-30
NZ794397B2 true NZ794397B2 (en) 2025-09-02

Family

ID=

Similar Documents

Publication Publication Date Title
Zhu et al. Generative adversarial frontal view to bird view synthesis
CN113421328B (en) Three-dimensional human body virtual reconstruction method and device
CN113066171B (en) A facial image generation method based on a three-dimensional facial deformation model
US12236517B2 (en) Techniques for multi-view neural object modeling
Beymer et al. Example based image analysis and synthesis
Vetter Synthesis of novel views from a single face image
CN112037320B (en) Image processing method, device, equipment and computer readable storage medium
Zhu et al. View extrapolation of human body from a single image
Ma et al. SCSCN: A separated channel-spatial convolution net with attention for single-view reconstruction
CN117422829A (en) An optimization method for face image synthesis based on neural radiation field
KR102419011B1 (en) Object recognition from images using conventional CAD models
CN117095128A (en) Priori-free multi-view human body clothes editing method
CN118212337A (en) A new viewpoint rendering method for human body based on pixel-aligned 3D Gaussian point cloud representation
CN119863400A (en) Virtual viewpoint shielding region restoration method based on super-fractal multi-mode depth fusion
Wang et al. Mirrornerf: One-shot neural portrait radiance field from multi-mirror catadioptric imaging
Wang et al. Digital twin: Acquiring high-fidelity 3D avatar from a single image
CN118115354A (en) A high-fidelity and lightweight face swapping method
US20200175376A1 (en) Learning Method, Learning Device, Program, and Recording Medium
NZ794397B2 (en) Techniques for multi-view neural object modeling
Luo et al. Robot artist performs cartoon style facial portrait painting
Khan et al. Towards monocular neural facial depth estimation: Past, present, and future
Pini et al. Learning to generate facial depth maps
US20220027720A1 (en) Method to parameterize a 3d model
CN113139424A (en) Multi-feature collaborative generation system and method for human body high-fidelity visual content
Zheng et al. Research on 3D object reconstruction based on single-view RGB image