CN115699094A

CN115699094A - System and application for generating synthetic data for synthesizing high resolution 3D shapes from low resolution representations

Info

Publication number: CN115699094A
Application number: CN202280003721.0A
Authority: CN
Inventors: 沈天畅; 高俊; 尹康学; 刘洺堉; S·菲德勒
Original assignee: Nvidia Corp
Current assignee: Nvidia Corp
Priority date: 2021-05-28
Filing date: 2022-04-11
Publication date: 2023-02-03

Abstract

In various examples, a deep three-dimensional (3D) conditional generation model is implemented that can synthesize high-resolution 3D shapes using simple guides, such as coarse voxels, point clouds, etc., by synthesizing implicit and explicit 3D representations into a hybrid 3D representation. The method can directly optimize the reconstructed surface, allowing the synthesis of finer geometric details with fewer artifacts. The systems and methods described herein may use a deformable tetrahedral grid encoding a discrete Symbolic Distance Function (SDF) and a micro-traversable tetrahedral layer that converts an implicit SDF representation to an explicit surface mesh representation traversable tetrahedron. This combination allows joint optimization of surface geometry and topology and generation of a subdivided hierarchy using well-defined reconstruction and antagonism losses on the surface mesh.

Description

System and application for generating synthetic data for synthesizing high resolution 3D shapes from low resolution representations

Background

Fields such as simulation, architecture, gaming, and movies all rely on high quality three-dimensional (3D) content with rich geometric details and topology. However, creating high quality 3D shapes suitable for such applications requires a significant amount of development time, computation, and memory — typically for each individual shape. In contrast, creating coarse 3D shapes-e.g., with voxels, blocks, sparse point clouds, etc. -consumes less time, computation, and memory, and has been widely adopted by all types of users, including those who may not have 3D modeling expertise.

Powerful 3D representations are key components of the learning-based 3D content creation framework. For example, a good 3D representation for high quality reconstruction and synthesis should be able to capture local geometric details and represent objects with arbitrary topology, while also being memory and computationally efficient for fast reasoning in interactive, near real-time and/or real-time applications. To achieve this goal, previous approaches have used a neuro-implicit representation that uses a neural network to represent the Symbolic Distance Field (SDF) and/or the Occupancy Field (OF) OF a shape. However, most existing implicit methods are trained by regression to SDF or occupancy values, and no explicit supervision can be used on the underlying surface (which would allow useful constraints beneficial to training), resulting in artifacts when synthesizing fine details. To alleviate this problem, some existing methods extract the surface mesh (mesh) from the implicit representation using an equivalent surface technique, such as Marching Cube (MC) algorithm-computationally expensive methods that rely heavily on the grid resolution used in MC. Running an equivalent surface at a limited resolution introduces quantization errors in the geometry and type of surface. Thus, existing implicit methods either use implicit representations that result in lower quality shape synthesis or use a combination of implicit and explicit iso-surface techniques that are computationally expensive and dependent on grid resolution-making these methods unsuitable for high quality shape synthesis in interactive, near real-time or real-time applications.

Some previous methods include voxel-based methods that represent 3D shapes as voxels, which store coarse occupancy (inside/outside) values on a regular grid. For high resolution shape synthesis, generative confrontation networks have been used to transfer geometric details from high resolution voxel shapes to low resolution shapes by using discriminators defined on 3D patches (patch) of voxel grids. However, as resolution increases, computation and memory costs increase cubically, prohibiting the reconstruction of fine geometric details and smooth curves.

Other previous methods use surface-based methods to directly predict triangular meshes. Typically, surface-based methods assume that the topology of the shape is predefined and for objects with complex topological variations, accuracy may be lost. Furthermore, similar to voxel-based methods, the computational cost increases cubically with grid resolution. Furthermore, the mesh generated in previous methods may contain types of errors, such as non-manifold vertices and edges due to selfing of the mesh faces.

Disclosure of Invention

Embodiments of the present disclosure relate to high resolution shape synthesis for deep learning systems and applications. The disclosed systems and methods use depth 3D conditional generation models to generate high resolution 3D shapes from lower resolution 3D guides, e.g., coarse voxels (coarse voxels), sparse point clouds, scans, etc. Distinguishable shape representations can be generated that combine both implicit and explicit 3D representations and optimize the reconstructed surface of the 3D shape to produce a higher quality shape with finer geometric details than previous methods that optimize predicted SDF or occupancy values. For example, in contrast to the method (meshes) of generating the representation, the systems and methods of the present disclosure produce shapes having an arbitrary topology. In particular, for example and without limitation, by using a Marching Tetrahedron (MT) algorithm, the underlying 2-manifold parameterization of a deformable tetrahedral grid parameterization encoded by an implicit function can be predicted and the underlying 2-manifold converted to an explicit mesh. The MT algorithm can be differentiable and has better performance than previous MC methods. The system can maintain efficiency by learning to adapt the resolution of the grid, by deforming and selectively subdividing tetrahedrons-e.g., by focusing calculations only on relevant regions of space. In contrast to octree-based shape synthesis, the network of the present disclosure learns grid deformation and subdivision collectively to better represent surfaces, without relying on explicit (explicit) supervision from a pre-computed hierarchy. The deep 3D convolution generative model may be end-to-end differentiable, allowing the network to jointly optimize the geometry and topology of the surface, as well as a hierarchy of subdivisions using loss functions explicitly defined on the surface mesh. Furthermore, previous methods claim that singularities in the MC formula prevent type changes in the training process, which is overridden by the present system and method. For example, the 3D representation of the present system and method extends to high resolution and does not require additional modifications to the backward prediction (pass). Furthermore, the deep 3D convolution generative model has the ability to represent arbitrary topologies and directly optimizes the surface reconstruction to alleviate these problems.

Drawings

The present system and method for high resolution shape synthesis for deep learning systems and applications is described in detail below with reference to the accompanying drawings, in which;

FIG. 1 is a data flow diagram illustrating a process of three-dimensional (3D) shape synthesis and reconstruction in accordance with some embodiments of the present disclosure;

figure 2A illustrates an example of a volume subdivision of a tetrahedron in accordance with some embodiments of the present disclosure;

fig. 2B illustrates an example of a visualization of surface estimates with and without volume subdivision, in accordance with some embodiments of the present disclosure;

FIG. 3 illustrates an example of identifying the location of the vertices of an iso-surface, in accordance with some embodiments of the present disclosure;

4A-4B illustrate graphs indicating computing and memory resource requirements with and without selective volume subdivision, according to some embodiments of the present disclosure;

FIG. 5 is a flow chart illustrating a method for high resolution shape synthesis according to some embodiments of the present disclosure;

FIG. 6 is a block diagram of a computing device suitable for use in implementing some embodiments of the present disclosure; and

fig. 7 is a block diagram of an example data center suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Systems and methods relating to high resolution shape synthesis for deep learning systems and applications are disclosed. The systems and methods described herein may be used for a variety of purposes, by way of example and not limitation, for machine control, machine motion, machine driving, synthetic data generation, model training, perception, augmented reality, virtual reality, mixed reality, robotics, security and surveillance, autonomous or semi-autonomous machine applications, deep learning, environmental simulation, data center processing, conversational artificial intelligence, light transport simulation (e.g., ray tracing, path tracing, etc.), collaborative content creation for 3D assets, cloud computing, and/or any other suitable application.

The disclosed embodiments may be included in a variety of different systems, such as automotive systems (e.g., control systems of autonomous or semi-autonomous machines, perception systems of autonomous or semi-autonomous machines), systems implemented using robots, aviation systems, medical systems, rowing systems, smart area monitoring systems, systems that perform deep learning operations, systems that perform simulation operations, systems implemented using edge devices, systems that incorporate one or more Virtual Machines (VMs), systems for performing synthetic data generation operations, systems implemented at least in part in a datacenter, systems for performing conversational artificial intelligence operations, systems for performing light transmission simulations, systems for performing collaborative content creation of 3D assets, systems implemented at least in part using cloud computing resources, and/or other types of systems. Although primarily described herein with respect to the creation, synthesis, or reconstruction of 3D shapes or content, this is not meant to be limiting, and the systems and methods of the present disclosure may be used for the creation, synthesis, or reconstruction of two-dimensional (2D) shapes or content without departing from the scope of the present disclosure.

Referring to fig. 1, fig. 1 is a data flow diagram illustrating a process 100 for 3D shape synthesis and reconstruction in accordance with some embodiments of the present disclosure. It should be understood that this and other arrangements described herein are set forth only as examples. Other arrangements and elements (e.g., machines, interfaces, functions, orders, groupings of functions, etc.) can be used in addition to or instead of those shown, and some elements may be omitted altogether. Further, many of the elements described herein are functional entities that may be implemented as discrete or distributed components, or in conjunction with other components, in any suitable combination and location. Various functions described herein as being performed by an entity may be carried out by hardware, firmware, and/or software. For example, various functions may be performed by a processor executing instructions stored in a memory. In some embodiments, one or more of the components, features, and/or functions may be similar to those of the example computing device 600 of fig. 6 and/or the example data center 700 of fig. 7.

The process 100 may be used to synthesize or reconstruct high quality 3D shapes and objects. To generate a 3D shape, input data representing one or more inputs 102 may be received and/or generated. The input 102 may include a point cloud (e.g., a sparse point cloud in an embodiment), a voxelized shape (e.g., a bold voxelized shape), a scan (e.g., a 3D scan), and/or another type of input 102, e.g., a lower quality. This input may be processed using one or more machine learning models, such as, but not limited to, depth 3D conditional generation models for high resolution shape synthesis as represented in fig. 1 (a) - (E). For example, the input 102 may be processed using the model for: (A) Predicting a Symbol Distance Field (SDF) at an initial grid resolution; (B) Selectively subdividing the tetrahedrons of the grid and interpolating (interpolate) the updated SDF of the subdivided grid; (C) Refining the boundary SDF and the morphed and pruned graph (graph); (D) Performing a marching tetrahedron method on the interpolated SDF to generate a triangular mesh; and (E) converting the triangular mesh into a parameterized surface using differentiable surface subdivision. For example, operations (a) - (C) may be performed to generate implicit functions 104, and operation (D) may be performed to generate explicit surfaces 106, and (E) surface subdivision may be performed to generate one or more outputs 108 (e.g., high quality 3D shapes or objects).

The model of process 100 may use a hybrid 3D representation designed for high resolution reconstruction and synthesis. The 3D representation may be represented using SDF encoded with a deformable tetrahedral grid. The grid may be a fully tetrahedral cell cube, where each cell in the volume may be a tetrahedron, having, for example, four vertices and faces. The benefits of this representation are: the grid vertices can be deformed to more efficiently represent the geometry of the shape. Furthermore, in embodiments, symbolic distance values may be defined on the vertices of the grid to implicitly represent the underlying surface, rather than defining the coding occupancy on each tetrahedron as in previous methods. The use of symbolic distance values, rather than occupancy, may provide more flexibility in representing the underlying surface. A deformable tetrahedral mesh may be used as an approximation of the implicit function. To generate a deformable tetrahedral grid, (V) _T T), in which V _T Are vertices of a tetrahedral grid T, each tetrahedron T _k E T can be represented by four vertices,

where K ∈ {1, \8230;, K }, where K is the sum of the tetrahedrons

The total number of (c). The SDF may be represented by inserting SDF values defined at the vertices of the mesh. For example, the SDF value may be at vertex v _i ∈V _T Is denoted by s (v) _i ). The SDF value of a point located inside a tetrahedron can be centered on the SDF value of the four vertices encapsulating that point (arycen)tri) interpolation.

To further increase flexibility while maintaining controllability of memory and computation, the tetrahedrons around the predicted surface may be subdivided-e.g., using selective subdivision. In this way, the shape can be represented in a coarse to fine manner to improve efficiency. Surface tetrahedron, T _surf A tetrahedron can be determined by examining whether it has a vertex of a different SDF symbol (e.g., one positive, one negative) -indicating that the tetrahedron intersects a surface encoded by the SDF. These surfaces are tetrahedral, T _surf May be subdivided, and furthermore, in embodiments, the neighbors of the surface tetrahedron may also be subdivided. Resolution can be improved by adding a midpoint on each edge, as shown in FIG. 2A, where each surface is tetrahedral, T _surf 202 by placing at each original vertex 206 (e.g., 206A (or v) _a ) 206B (or v) _b ) 206C (or v) _c ) And 206D (or v) _d ) Increased midpoints 204 (e.g., 204a,204b,204c,204d, and 204F) are divided into eight tetrahedrons. The SDF value may then be calculated for the new vertex by, for example, averaging the SDF values on the edges (e.g., if the SDF values for the original vertex are-2 and +4, the intermediate point or new vertex SDF value may be + 1).

Fig. 2B shows the result of volume subdivision along surface tetrahedra compared to not using volume subdivision. For example, the visualization 230 includes a portion 236 of the estimated surface along with the ground truth surface 238, where the portion 236 of the estimated surface does not capture the contour of the ground truth surface 238. However, the visualization 232 includes a portion 236 of the estimated surface after the volume subdivision and before the local updating of the vertex position and the SDF, whereas the visualization 234 includes an updated portion 240 of the estimated surface after the volume subdivision and after the updating of the vertex position and the SDF. The updated portion 240 of the estimated surface is closer to the contour of the ground truth surface 238, resulting in a more accurate implicit representation of the object.

Implicit representation based on symbolic distance-e.g. after subdivision-may be converted into a triangular mesh using a traveling tetrahedral layer, and the mesh may be converted into parameters with a differentiable surface subdivision moduleAnd (5) surface treatment. For example, the coded SDF may be converted into an explicit triangular mesh using a Marching Tetrahedron (MT) algorithm. Considering the SDF value of the tetrahedral vertex, { s (v) _a ),s(v _b ),s(v _c ),s(v _d ) Based on the sign of s (v), the MT algorithm can be used to determine the surface type inside the tetrahedron, as shown in fig. 3. In such an example, the total number of configurations may be 2 ⁴ Or 16, which are three unique cases after considering rotational symmetry. Once the surface type inside the tetrahedron is identified, the location of the vertices of the iso-surface can be calculated at the zero-point intersections of the linear interpolation along the edges of the tetrahedron, as shown in FIG. 3. In one or more embodiments, only the symbol s (v) is present _a ) Not equal to symbol s (v) _b ) Only then can the equation be evaluated; thus, singularities in the formula (e.g., when s (v) is present _a )＝s(v _b ) Time), and the gradient from the penalty defined at the extracted iso-surface may be propagated back to both the vertex position and the SDF value, e.g., via the chainen (chancurn) rule.

Differentiable surface subdivision may be performed on the triangular mesh to improve the representational power and visual quality of the graph. Instead of using a fixed set of parameters for subdivision, a loop subdivision method may be performed, using learnable parameters. In particular, the learnable parameters may include each mesh vertex v' _i And α, and _i the learnable parameters control the generated surface via weighting the smoothness of the neighboring vertices. In contrast to previous approaches, and to save computational resources, the parameters of each vertex may only be predicted at the beginning and run through into subsequent subdivision iterations. The result may be an explicit surface 106 that may be used to generate output 108-e.g., a shape or object represented using a parameterized surface.

In a non-limiting embodiment, a Deep Neural Network (DNN) that may be used to generate output 108 may include a 3D depth conditional generation model. For example, DNN may use the hybrid 3D representation described herein to learn to output a high resolution 3D mesh from input x, which may include a point cloud, a bolded shape, a scan, and/or the like. For example, the DNN may include one or more modules, each of which is tasked with computing an intermediate or final output to generate the 3D mesh, M, in processing the input x.

In some embodiments, as shown in fig. 1, the model may include one or more machine learning models that perform tasks with the initial SDF prediction 110. Thus, the model may include extracting a 3D feature volume F from the point cloud _vol (x) Input encoder of (2). When the input 102 is not a point cloud but a coarse voxelized shape, for example, a surface of the voxelized shape may be sampled to generate a point cloud. The machine learning model may then be used to interpolate to grid vertices via tri-linear interpolation

Generating a feature vector F _vol (x) In that respect The initial prediction of the SDF value for each vertex in the initial deformable tetrahedral grid may use, for example, a fully connected network s (v) = MLP (F) _vol (v, x), v). The fully connected network may additionally output a feature vector f (v), which may be used for surface refinement during the volume subdivision stage.

After obtaining the initial SDF, surface refinement 112 may be performed to iteratively refine the surface and subdivide the tetrahedral grid. For example, the surface tetrahedron T can be identified from the current s (v) value _surf And can generate a pattern, G = (V) _surf ,E _surf ) In which V is _surf And E _surf Corresponds to T _surf The vertices and edges of (2). Position deviation Deltav _i And SDF residual value Δ s (v) _i ) May use, for example, a graphic convolutional network pair V _surf Is predicted, for example, as expressed in the following equations (1) and (2):

wherein N is _surf Is V _surf Total number of vertices in, f (v) _i ) Is each vertex feature updated). Each vertex v _i The vertex position and SDF value of may be updated to v' _i ＝v _i +Δv _i And s (v' _i )＝s(v _i )+Δs(v _i ). This refinement operation may flip the sign of the SDF value to refine the local typology and move the vertices to improve the local geometry.

After surface refinement, a volume refinement operation may be performed, and additional surface refinement operations may follow. E.g. T _surf Can be re-identified, and T _surf And neighbors may be subdivided. Non-subdivided tetrahedra can be discarded or excluded from the complete tetrahedral grid in both operations, which can save memory and computation in embodiments because of T _surf Is proportional to the surface area of the object and expands in a quadratic (rather than cubic) manner as the resolution of the grid increases. For example, as shown in fig. 4A and 4B, graph 400 shows volume tessellation and surface refinement calculations that do not exclude non-tessellated tetrahedrons, and graph 402 shows volume tessellation and surface refinement calculations when non-tessellated tetrahedrons are excluded.

Furthermore, because the SDF values and locations are inherited from the level before subdivision (inherited), the penalty computed at the final surface can be propagated back to all vertices at all levels. Thus, the model can automatically learn the subdivision tetrahedron without the need to supervise the learning of the octree hierarchy without additional penalties in intermediate steps as in previous methods.

After extracting the surface mesh using the marching tetrahedral algorithm (e.g., operation (D) in fig. 1), the learnable surface subdivision may be applied at (E). Because the output is a mesh of triangles, the learnable surface subdivision can transform the output into a parameterized surface with infinite resolution, which allows end-to-end trainable properties of the model. In practice, a new graph may be generated on the extracted mesh, and a graph convolution network may be used to predict each vertex v' _i Is updated, and α _i And performing cyclic subdivision. This operation can remove quantization errors and can adjust alpha _i To mitigate the approximation error of the classical loop subdivision, alpha _i Which in the classical method is fixed.

In some embodiments, given a differentiable surface representation from a model, a 3D discriminator (discriminator) may be applied on the final surface predicted using the 3D generator (e.g., after the implicit function 104, the marching tetrahedron algorithm, and/or the surface subdivision to generate the explicit surface 106). The 3D discriminator may be used for local patches sampled from high curvature regions and predicted meshes, and losses, such as the antagonism losses described herein, may drive the predictive reconstruction of high fidelity geometric details. For example, the 3D discriminator may include a 3D Convolutional Neural Network (CNN) and may be used for SDF computed from the predicted mesh to capture local details. Vertices v of high curvature may be randomly selected from the target mesh, and the ground truth SDF computed in the voxelized region around v,

similarly, the SDF of the predicted surface mesh M may be computed at the same location to obtain

S _pred May correspond to the analytic function of the grid M, thus S _pred May be propagated back to the apex position of M. S. the _real And S _pred And feature vector F in location v _vol (v, x) may be fed together to the arbiter 114. The discriminator 114 may then predict a probability indicating whether the input is from a real shape or a generated shape.

The model of the present disclosure, e.g., a 3D depth conditional generative model, may be end-to-end trainable. In one or more embodiments, one or more modules may be supervised to minimize the error defined on the final prediction grid M. One or more loss functions may be used, each loss function including one or more different loss terms. For example, in a non-limiting embodiment, a loss function comprising three different terms may be used: a loss of surface alignment to encourage alignment with the ground truth surface; a counter-loss to improve the authenticity of the generated shape; and regularization for regularizing the behavior of the SDF and vertex morphing.

The surface alignment penalty may include deriving from a ground truth (ground route) grid M _gt In the surface of (2) a set of points P is extracted _gt . Can also be selected from M _pred In order to obtain P _pred And can be at P _gt And P _pred Minimizing the L2 chamfer distance and normal consistency loss. For example, the surface alignment loss can be calculated using the following equation (3):

wherein the content of the first and second substances,

is the point corresponding to p when calculating the chamfer distance,

and

respectively represent the number of p, and each represents p,

the normal direction of the point.

The resistance loss can be calculated according to the following equation (4):

with respect to regularization, the penalty functions of equations (3) and (4) operate on the extracted surface, so that only vertices in the tetrahedral grid that are close to the iso-surface may receive a gradient, while other vertices may not. Surface loss may also not provide information about the interior and/or exterior, because flipping the SDF symbols of all vertices in a tetrahedron results in the same surface being extracted by the marching tetrahedron algorithm. This can result in disconnected components during training, so SDF losses can be added to normalize the SDF values. In some embodiments, the SDF regularization loss may be calculated according to equation (5) below:

L＝λ _cd L _cd +λ _normal L _normal +λ _G L _G +λ _SDF L _SDF +λ _def L _def (5) Wherein λ _cd 、λ _normal 、λ _G 、λ _SDF And λ _def Is a hyper-parameter.

Referring now to fig. 5, each block of the method 500 described herein includes a computing process that may be performed using any combination of hardware, firmware, and/or software. For example, various functions may be carried out by a processor executing instructions stored in memory. The method 500 may also be embodied as computer-useable instructions stored on a computer storage medium. Method 500 may be provided by a stand-alone application, a service or hosted service (either alone or in combination with another hosted service), or a plug-in to another product, to name a few. Further, as an example, method 500 is described with respect to process 100 of fig. 1. However, the method 500 may additionally or alternatively be performed by any one or combination of processes or systems, including but not limited to those described herein.

Fig. 5 is a flow diagram illustrating a method 500 for high resolution shape synthesis according to some embodiments of the present disclosure. At block B502, the method 500 includes computing a Symbolic Distance Field (SDF) at the initial grid resolution of the tetrahedral grid based at least in part on the input representation of the object. For example, using input 102, the SDF may be calculated at the initial grid resolution of the tetrahedral grid.

At block B504, method 500 includes subdividing and deforming the tetrahedral grid to generate an updated tetrahedral grid at the updated resolution. For example, the tetrahedral grid may be selectively subdivided and deformed.

At block B506, the method 500 includes calculating an updated SDF using the SDF and the updated tetrahedral grid. For example, based on the subdivision and the deformation, the SDF values of the updated vertices of the updated tetrahedral grid may be computed.

In some embodiments, the operations of block B504 and/or block B506 may be performed multiple times — e.g., until a target resolution is reached.

At block B508, method 500 includes performing a marching tetrahedral algorithm on the updated tetrahedral grid to generate a triangular mesh. For example, a marching tetrahedral algorithm may be performed on the deformable grid (e.g., after subdividing, deforming, and updating the SDF) to extract the iso-surface (e.g., a triangular mesh).

At block B510, the method 500 includes subdividing the triangular mesh to generate a final surface representation of the object. The surface subdivision may then be applied to the iso-surface to generate a parameterized (e.g., explicit) surface as output 108.

Example computing device

Fig. 6 is a block diagram of an example computing device 600 suitable for use in implementing some embodiments of the present disclosure. The computing device 600 may include an interconnection system 602 that directly or indirectly couples the following devices: memory 604, one or more Central Processing Units (CPUs) 606, one or more Graphics Processing Units (GPUs) 608, a communication interface 610, input/output (I/O) ports 612, input/output components 614, a power supply 616, one or more presentation components 618 (e.g., a display), and one or more logic units 620. In at least one embodiment, computing device 600 may include one or more Virtual Machines (VMs), and/or any components thereof may include virtual components (e.g., virtual hardware components). For non-limiting examples, the one or more GPUs 608 may include one or more vGPU, the one or more CPUs 606 may include one or more vGPU, and/or the one or more logic units 620 may include one or more virtual logic units. Thus, computing device 600 may include discrete components (e.g., a complete GPU dedicated to computing device 600), virtual components (e.g., a portion of a GPU dedicated to computing device 600), or a combination thereof.

Although the various blocks of fig. 6 are shown connected via an interconnect system 602 having lines, this is not intended to be limiting and is for clarity only. For example, in some embodiments, a presentation component 618, such as a display device, may be considered an I/O component 614 (e.g., if the display is a touch screen). As another example, the CPU 606 and/or the GPU 608 may include memory (e.g., the memory 604 may represent a storage device other than memory of the GPU 608, the CPU 606, and/or other components). In other words, the computing device of fig. 6 is merely illustrative. No distinction is made between categories such as "workstation," "server," "laptop," "desktop," "tablet," "client device," "mobile device," "handheld device," "gaming console," "Electronic Control Unit (ECU)," "virtual reality system," and/or other device or system types, as all are contemplated within the scope of the computing device of fig. 6.

The interconnect system 602 may represent one or more links or buses, such as an address bus, a data bus, a control bus, or a combination thereof. The interconnect system 602 may include one or more links or bus types, such as an Industry Standard Architecture (ISA) bus, an Extended Industry Standard Architecture (EISA) bus, a Video Electronics Standards Association (VESA) bus, a Peripheral Component Interconnect (PCI) bus, a peripheral component interconnect express (PCIe) bus, and/or another type of bus or link. In some embodiments, there is a direct connection between the components. By way of example, the CPU 606 may be directly connected to the memory 604. Further, the CPU 606 may be directly connected to the GPU 608. Where there is a direct or point-to-point connection between components, the interconnect system 602 may include a PCIe link to perform the connection. In these examples, the PCI bus need not be included in computing device 600.

Memory 604 may include any of a variety of computer-readable media. Computer readable media can be any available media that can be accessed by computing device 600. Computer readable media may include both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.

Computer storage media may include volatile and nonvolatile media, and/or removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, and/or other data types. For example, memory 604 may store computer readable instructions (e.g., that represent a program and/or program element, such as an operating system). Computer storage media may include, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computing device 600. As used herein, a computer storage medium does not include a signal per se.

Computer storage media may embody computer readable instructions, data structures, program modules, and/or other data types in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media. The term "modulated data signal" may refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, computer storage media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.

The CPU 606 may be configured to execute at least some of the computer readable instructions in order to control one or more components of the computing device 600 to perform one or more of the methods and/or processes described herein. Each of the CPUs 606 can include one or more cores (e.g., one, two, four, eight, twenty-eight, seventy-two, etc.) capable of processing a large number of software threads simultaneously. The CPU 606 may include any type of processor, and may include different types of processors, depending on the type of computing device 600 implemented (e.g., a processor with fewer cores for a mobile device and a processor with more cores for a server). For example, depending on the type of computing device 600, the processor may be an advanced instruction set computing (RISC) mechanism (ARM) processor implemented using RISC or an x86 processor implemented using Complex Instruction Set Computing (CISC). In addition to one or more microprocessors or supplemental coprocessors such as math coprocessors, computing device 600 may also include one or more CPUs 606.

The gpu 608 may be configured to execute at least some computer readable instructions in addition to or in lieu of the CPU 606, to control one or more components of the computing device 600 to perform one or more of the methods and/or processes described herein. The one or more GPUs 608 can be integrated GPUs (e.g., having one or more CPUs 606) and/or the one or more GPUs 608 can be discrete GPUs. In an embodiment, the one or more GPUs 608 may be coprocessors for the one or more CPUs 606. The computing device 600 may use the GPU 608 to render graphics (e.g., 3D graphics) or perform general-purpose computations. For example, the GPU 608 may be used for general purpose computing on a GPU (GPGPU). The GPU 608 may include hundreds or thousands of cores capable of processing hundreds or thousands of software threads simultaneously. The GPU 608 may generate pixel data for outputting an image in response to rendering commands (e.g., received from the CPU 606 via a host interface). The GPU 608 may include a graphics memory, such as a display memory, for storing pixel data or any other suitable data (e.g., GPGPU data). Display memory may be included as part of memory 604. The GPUs 608 may include two or more GPUs operating in parallel (e.g., via a link). The link may connect the GPU directly (e.g., using NVLINK) or may connect the GPU through a switch (e.g., using NVSwitch). When combined together, each GPU 608 may generate pixel data or GPGPU data for a different portion of output or for a different output (e.g., a first GPU for a first image and a second GPU for a second image). Each GPU may include its own memory, or may share memory with other GPUs.

In addition to or in lieu of CPU 606 and/or GPU 608, logic 620 may be configured to execute at least some computer-readable instructions to control one or more components of computing device 600 to perform one or more of the methods and/or processes described herein. In embodiments, the CPU 606, GPU 608, and/or logic 620 may perform any combination of methods, processes, and/or portions thereof, either discretely or jointly. The one or more logic units 620 may be part of and/or integrated within the one or more CPUs 606 and/or the one or more GPUs 608 and/or the one or more logic units 620 may be discrete components of the CPUs 606 and/or GPUs 608 or otherwise external thereto. In an embodiment, the one or more logic units 620 may be one or more CPUs 606 and/or one or more processors of the GPU 608.

Examples of logic unit 620 include one or more processing cores and/or components thereof, such as a Data Processing Unit (DPU), tensor Core (TC), tensor Processing Unit (TPU), pixel Vision Core (PVC), visual Processing Unit (VPU), graphics Processing Cluster (GPC), texture Processing Cluster (TPC), streaming Multiprocessor (SM), tree Traversal Unit (TTU), artificial Intelligence Accelerator (AIA), deep Learning Accelerator (DLA), arithmetic Logic Unit (ALU), application Specific Integrated Circuit (ASIC), floating Point Unit (FPU), input/output (I/O) elements, peripheral Component Interconnect (PCI), or peripheral component interconnect express (PCIe) elements, and so forth.

Communication interface 610 may include one or more receivers, transmitters, and/or transceivers to enable computing device 600 to communicate with other computing devices via an electronic communication network, including wired and/or wireless communication. Communication interface 610 may include components and functionality to enable communication over any of a number of different networks, such as a wireless network (e.g., wi-Fi, Z-wave, bluetooth LE, zigBee, etc.), a wired network (e.g., communication over ethernet or InfiniBand), a low-power wide area network (e.g., loRaWAN, sigFox, etc.), and/or the internet. In one or more embodiments, the logic unit 620 and/or the communication interface 610 may include one or more Data Processing Units (DPUs) for sending data received over a network and/or through the interconnect system 602 directly to (e.g., a memory thereof) the one or more GPUs 608.

The I/O ports 612 may enable the computing device 600 to be logically coupled to other devices including I/O components 614, presentation components 618, and/or other components, some of which may be built into (e.g., integrated into) the computing device 600. Illustrative I/O components 614 include a microphone, mouse, keyboard, joystick, game pad, game controller, satellite dish, browser, printer, wireless device, and the like. The I/O component 614 may provide a Natural User Interface (NUI) that handles user-generated air gestures, speech, or other physiological inputs. In some instances, the input may be transmitted to an appropriate network element for further processing. The NUI may implement any combination of voice recognition, stylus recognition, facial recognition, biometric recognition, gesture recognition on and near the screen, air gestures, head and eye tracking, and touch recognition associated with a display of the computing device 600 (as described in more detail below). Computing device 600 may include a depth camera such as a stereo camera system, an infrared camera system, an RGB camera system, touch screen technology, and combinations of these for gesture detection and recognition. Further, the computing device 600 may include an accelerometer or gyroscope (e.g., as part of an Inertial Measurement Unit (IMU)) that enables motion detection. In some examples, the output of an accelerometer or gyroscope may be used by computing device 600 to render immersive augmented reality or virtual reality.

The power source 616 may include a hard-wired power source, a battery power source, or a combination thereof. The power supply 616 may provide power to the computing device 600 to enable the components of the computing device 600 to operate.

The presentation component 618 may include a display (e.g., a monitor, touch screen, television screen, heads-up display (HUD), other display types, or combinations thereof), speakers, and/or other presentation components. The presentation component 618 can receive data from other components (e.g., GPU 608, CPU 606, DPU, etc.) and output the data (e.g., as images, video, sound, etc.).

Example data center

Fig. 7 illustrates an example data center 700 that may be used in at least one embodiment of the present disclosure. The data center 700 may include a data center infrastructure layer 710, a framework layer 720, a software layer 730, and an application layer 740.

As shown in fig. 7, the data center infrastructure layer 710 may include a resource coordinator 712, grouped computing resources 714, and nodal computing resources ("nodal c.r.") 716 (1) -716 (N), where "N" represents any whole positive integer. In at least one embodiment, nodes c.r.716 (1) -716 (N) may include, but are not limited to, any number of central processing units ("CPUs") or other processors (including DPUs, accelerators, field Programmable Gate Arrays (FPGAs), graphics processors or Graphics Processing Units (GPUs), etc.), memory devices (e.g., dynamic read only memory), storage devices (e.g., solid state drives or disk drives), network input/output ("NW I/O") devices, network switches, virtual machines ("VMs"), power modules, and cooling modules, etc. In some embodiments, one or more of the nodes c.r.716 (1) -716 (N) may correspond to a server having one or more of the above-described computing resources. Further, in some embodiments, nodes c.r.716 (1) -716 (N) may include one or more virtual components, such as vGPU, vCPU, etc., and/or one or more of nodes c.r.716 (1) -716 (N) may correspond to a Virtual Machine (VM).

In at least one embodiment, grouped computing resources 714 may comprise a single group (not shown) of nodes c.r.716 housed within one or more racks, or a number of racks (also not shown) housed within data centers at various geographic locations. The individual groupings of nodes c.r.716 within the grouped computing resources 714 may include computing, network, memory, or storage resources that may be configured or allocated as groups to support one or more workloads. In at least one embodiment, several nodes c.r.716, including CPUs, GPUs, DPUs, and/or other processors, may be grouped within one or more racks to provide computing resources to support one or more workloads. One or more racks may also include any number of power modules, cooling modules, and/or network switches in any combination.

The resource coordinator 712 may configure or otherwise control one or more nodes c.r.716 (1) -716 (N) and/or grouped computing resources 714. In at least one embodiment, resource coordinator 712 may include a software design infrastructure ("SDI") management entity for data center 700. The resource coordinator 712 may comprise hardware, software, or some combination thereof.

In at least one embodiment, as shown in FIG. 7, framework layer 720 may include a job scheduler 732, a configuration manager 734, a resource manager 736, and a distributed file system 738. The framework layer 720 can include a framework that supports software 732 of the software layer 730 and/or one or more applications 742 of the application layer 740. Software 732 or application 742 may comprise Web-based Services software or applications, respectively, such as those provided by Amazon Web Services, google Cloud, and Microsoft Azure. The framework layer 720 may be, but is not limited to, a free and open-source software web application framework, such as Apache Spark, which may utilize a distributed file system 738 for large-scale data processing (e.g., "big data") ^TM (hereinafter referred to as "Spark"). In at least one embodiment, job scheduler 732 may include a Spark driver to facilitate scheduling workloads supported by various layers of data center 700. In at least one embodiment, the configuration manager 734 may be capable of configuring different layers, such as a software layer 730 and a framework layer 720 including Spark and a distributed file system 738 for supporting large-scale data processing. Resource manager 736 is capable of managing the cluster or packet computing resources that are mapped to or allocated to support distributed file system 738 and job scheduler 732. In at least one embodiment, the clustered or grouped computing resources may include grouped computing resources 714 at the data center infrastructure layer 710. The resource manager 736 may coordinate with the resource coordinator 712 to manage these mapped or allocated computing resources.

In at least one embodiment, the software 732 included in the software layer 730 may include software used by at least portions of the nodes c.r.716 (1) -716 (N), the grouped computing resources 714, and/or the distributed file system 738 of the framework layer 720. One or more types of software may include, but are not limited to, internet web searching software, email virus browsing software, database software, and streaming video content software.

In at least one embodiment, the one or more application programs 742 included in the application layer 740 may include one or more types of application programs used by at least portions of the nodes c.r.716 (1) -716 (N), the grouped computing resources 714, and/or the distributed file system 738 of the framework layer 720. The one or more types of applications can include, but are not limited to, any number of genomics applications, cognitive computing and machine learning applications, including training or reasoning software, machine learning framework software (e.g., pyTorch, tensrflow, caffe, etc.), and/or other machine learning applications used in connection with one or more embodiments.

In at least one embodiment, any of configuration manager 734, resource manager 736, and resource coordinator 712 may implement any number and type of self-modifying actions based on any number and type of data obtained in any technically feasible manner. The self-modifying action may mitigate a data center operator of data center 700 from making potentially bad configuration decisions and may avoid underutilized and/or poorly performing portions of the data center.

Data center 700 may include tools, services, software, or other resources for training one or more machine learning models or using one or more machine learning models to predict or infer information in accordance with one or more embodiments described herein. For example, the machine learning model may be trained by computing the weight parameters from a neural network architecture using the software and computing resources described above with respect to the data center 700. In at least one embodiment, using the weight parameters calculated through one or more training techniques, the information can be inferred or predicted using the resources described above with respect to data center 700 using a trained machine learning model corresponding to one or more neural networks, such as, but not limited to, those described herein.

In at least one embodiment, data center 700 may use a CPU, application Specific Integrated Circuit (ASIC), GPU, FPGA, and/or other hardware (or virtual computing resources corresponding thereto) to perform training and/or reasoning using the aforementioned resources. Further, one or more of the software and/or hardware resources described above may be configured as a service to allow a user to train or perform information reasoning, such as image recognition, voice recognition, or other artificial intelligence services.

Example network Environment

A network environment suitable for implementing embodiments of the present disclosure may include one or more client devices, servers, network Attached Storage (NAS), other backend devices, and/or other device types. Client devices, servers, and/or other device types (e.g., each device) may be implemented on one or more instances of computing device 600 of fig. 6-e.g., each device may include similar components, features, and/or functionality of computing device 600. Further, where backend devices (e.g., servers, NAS, etc.) are implemented, the backend devices can be included as part of data center 700, examples of which are described in more detail herein with respect to fig. 7.

The components of the network environment may communicate with each other over a network, which may be wired, wireless, or both. The network may include multiple networks, or a network of multiple networks. For example, the network may include one or more Wide Area Networks (WANs), one or more Local Area Networks (LANs), one or more public networks (e.g., the internet and/or the Public Switched Telephone Network (PSTN)), and/or one or more private networks. Where the network comprises a wireless telecommunications network, components such as base stations, communication towers, or even access points (among other components) may provide wireless connectivity.

Compatible network environments may include one or more peer-to-peer network environments (in which case a server may not be included in the network environment), and one or more client-server network environments (in which case one or more servers may be included in the network environment). In a peer-to-peer network environment, the functionality described herein with respect to a server may be implemented on any number of client devices.

In at least one embodiment, the network environment may include one or more cloud-based network environments, distributed computing environments, combinations thereof, and the like. A cloud-based network environment may include a framework layer, a job scheduler, a resource manager, and a distributed file system implemented on one or more servers, which may include one or more core network servers and/or edge servers. The framework layer may include a framework for supporting software of the software layer and/or one or more applications of the application layer. The software or application may comprise a web-based service software or application, respectively. In embodiments, one or more client devices may use network-based service software or applications (e.g., by accessing the service software and/or applications via one or more Application Programming Interfaces (APIs)). The framework layer may be, but is not limited to, a type of free and open source software web application framework, such as may be used for large-scale data processing (e.g., "big data") using a distributed file system.

A cloud-based network environment may provide cloud computing and/or cloud storage that performs any combination of the computing and/or data storage functions described herein (or one or more portions thereof). Any of these various functions may be distributed across multiple locations from a central or core server (e.g., which may be distributed across one or more data centers in a state, region, country, world, etc.). If the connection to the user (e.g., client device) is relatively close to the edge server, the core server may assign at least a portion of the functionality to the edge server. A cloud-based network environment may be private (e.g., limited to only a single organization), may be public (e.g., available to many organizations), and/or a combination thereof (e.g., a hybrid cloud environment).

The client device may include at least some of the components, features, and functionality of the example computing device 600 described herein with respect to fig. 6. By way of example and not limitation, a client device may be embodied as a Personal Computer (PC), laptop computer, mobile device, smartphone, tablet computer, smart watch, wearable computer, personal Digital Assistant (PDA), MP3 player, virtual reality head-mounted display, global Positioning System (GPS) or device, video player, camera, surveillance device or system, vehicle, watercraft, aircraft, virtual machine, drone, robot, handheld communication device, hospital device, gaming device or system, entertainment system, in-vehicle computer system, embedded system controller, remote control, appliance, consumer electronics, workstation, edge device, any combination of these descriptive devices, or any other suitable device.

The disclosure may be described in the general context of machine-useable instructions, or computer code, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal digital assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. The present disclosure may be practiced in a wide variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, and the like. The disclosure may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.

As used herein, a statement that "and/or" pertains to two or more elements should be interpreted as referring to only one element or a combination of elements. For example, "element a, element B, and/or element C" may include only element a, only element B, only element C, element a and element B, element a and element C, element B and element C, or elements a, B, and C. Further, "at least one of element a or element B" may include at least one of element a, at least one of element B, or at least one of element a and at least one of element B. Further, "at least one of element a and element B" may include at least one of element a, at least one of element B, or at least one of element a and at least one of element B.

The subject matter of the present disclosure is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this disclosure. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms "step" and/or "block" may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.

Claims

1. A processor, comprising: one or more circuits to: computing a Symbolic Distance Field (SDF) at an initial grid resolution of the grid based at least in part on the input representation of the object; subdividing and deforming the grid to generate an updated grid at an updated resolution; calculating an updated SDF using the SDF and the updated grid; generating a triangular mesh using the updated grid; and subdividing the triangular mesh to generate a final surface representation of the object.

2. The processor of claim 1, wherein the final surface representation comprises a parameterized surface representation.

3. The processor of claim 1, wherein subdivision of the triangular mesh is performed using a learned surface subdivision.

4. The processor of claim 1, wherein the input representation of the object comprises at least one of a voxel representation, a point cloud, or a three-dimensional (3D) scan.

5. The processor of claim 1, wherein the updated SDF is interpolated from the SDF using one or more updated vertex positions of an updated tetrahedral grid.

6. The processor of claim 1, wherein the calculation of the SDF is performed at least in part by: calculating one or more first feature vectors using a convolutional neural network; and calculating one or more SDF values and one or more second feature vectors for one or more vertices of the grid using a neural network and based at least in part on the one or more first feature vectors.

7. The processor of claim 1, wherein the subdividing and the deforming of the grid are performed at least in part by: identifying one or more surface volumes of the grid corresponding to a surface of the object; generating a map corresponding to one or more vertices and one or more edges of the one or more surface volumes; and calculating one or more location offsets and one or more residual SDF values for the one or more vertices using a graph convolution network and based at least in part on the graph.

8. The processor of claim 1, wherein the subdivision of the grid comprises a selective subdivision, wherein the selective subdivision comprises subdivision of at least one of: one or more first surface volumes of the grid intersecting a surface of the object; or one or more second surface volumes immediately adjacent to the one or more first surface volumes.

9. The processor of claim 1, wherein the one or more circuits are to generate the final surface representation using a generative countermeasure network (GAN).

10. The processor of claim 1, wherein the processor is included in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing a simulation operation; a system for performing optical transmission simulation; a system for performing 3D asset collaboration content creation; a system for performing a deep learning operation; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational artificial intelligence operation; a system for generating synthetic data; a system that merges one or more Virtual Machines (VMs); a system implemented at least in part in a data center; or a system implemented at least in part using cloud computing resources.

11. A system, comprising: one or more processing units comprising processing circuitry to: computing a Symbolic Distance Field (SDF) corresponding to the initial grid based at least in part on the input representation of the object; subdividing and deforming the initial grid to generate an updated grid; calculating an updated SDF using the SDF and the updated grid; generating an explicit surface representation using the updated grid; and subdividing the explicit surface representation to generate a final surface representation of the object.

12. The system of claim 11, wherein the final surface representation comprises a parameterized surface representation.

13. The system of claim 11, wherein the tessellation of the explicit surface representation is performed using learned surface tessellation.

14. The system of claim 11, wherein the input representation of the object comprises at least one of a voxel representation, a point cloud, or a three-dimensional (3D) scan.

15. The system of claim 11, wherein the updated SDF is interpolated from the SDF using one or more updated vertex positions of an updated tetrahedral grid.

16. The system of claim 11, wherein the calculation of the SDF is performed at least in part by: calculating one or more first feature vectors using a convolutional neural network; and calculating, using a neural network and based at least in part on the one or more first feature vectors, one or more SDF values and one or more second feature vectors for one or more vertices of the grid.

17. The system of claim 11, wherein the subdividing and the deforming of the grid are performed at least in part by: identifying one or more surface volumes of the grid corresponding to a surface of the object; generating a map corresponding to one or more vertices and one or more edges of the one or more surface volumes; and calculating one or more location offsets and one or more residual SDF values for the one or more vertices using a graph convolution network and based at least in part on the graph.

18. The system of claim 11, wherein the subdivision of the grid comprises a selective subdivision, wherein the selective subdivision comprises subdivision of at least one of: one or more first surface volumes of the grid intersecting a surface of the object; or one or more second surface volumes immediately adjacent to the one or more first surface volumes.

19. The system of claim 11, wherein the one or more circuits generate the final surface representation using a generative countermeasure network (GAN).

20. The system of claim 11, wherein the system is included in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing a simulation operation; a system for performing optical transmission simulation; a system for performing 3D asset collaboration content creation; a system for performing a deep learning operation; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational, artificial intelligence operation; a system for generating synthetic data; a system that merges one or more Virtual Machines (VMs); a system implemented at least partially in a data center; or a system implemented at least in part using cloud computing resources.

21. A processor, comprising: a processing circuit to: generating an implicit representation of the shape using a Symbolic Distance Field (SDF) corresponding to the deformable grid; extracting an iso-surface from the deformable grid; and generating an explicit representation of the shape from the extracted iso-surface using a generative confrontation network (GAN).

22. The processor of claim 21, wherein the processor is included in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing a simulation operation; a system for performing optical transmission simulation; a system for performing 3D asset collaboration content creation; a system for performing a deep learning operation; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational artificial intelligence operation; a system for generating synthetic data; a system that merges one or more Virtual Machines (VMs); a system implemented at least in part in a data center; or a system implemented at least in part using cloud computing resources.