CN112891945B

CN112891945B - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN112891945B
Application number: CN202110326382.7A
Authority: CN
Inventors: 殷俊; 汤嘉恒; 杨洁; 高林
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-11-18
Anticipated expiration: 2041-03-26
Also published as: CN112891945A

Abstract

The embodiment of the application provides a data processing method and device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring source data corresponding to the three-dimensional object; based on the source data, acquiring first geometric characteristics of nodes of multiple levels with a tree structure corresponding to the source data, wherein the nodes of different levels correspond to different spaces in a three-dimensional space of the three-dimensional object; determining a target node in the plurality of nodes based on the acquired first geometric characteristics of the plurality of nodes, and decoding the characteristic representation of the target node to obtain second geometric characteristics corresponding to each sub-node under the target node; and processing the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding to obtain corresponding processing results. Based on the scheme of the application, the extraction efficiency and accuracy of the geometric features of the three-dimensional object can be effectively improved, the data processing requirements corresponding to the three-dimensional object are better met, and the processing efficiency and effect are improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer, artificial intelligence, and game technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

Studying efficient geometric three-dimensional representations (which may also be referred to as 3D representations, geometric features, geometric shape features, geometric representations, geometric shape features, etc.) is one of the core tasks of computer vision and computer graphics, involving from upper-level applications (e.g., scene understanding, object recognition, object classification, etc.) to various underlying tasks (including reconstructing, interpolating, and manipulating 3D (three-dimensional) shapes, etc.).

In order to adapt to various application scenarios, the research of 3D characterization has been a focus of research of technicians, and although there exist various learning methods of 3D characterization such as a point cloud and voxel representation manner, a learning method based on a grid, a recently appeared learning method based on a neural implicit function, etc., research finds that the effect of 3D characterization learned by various existing manners in practical application is still not ideal, and the learning of 3D characterization still needs to be improved.

Disclosure of Invention

An object of the embodiments of the present application is to provide a data processing method, an apparatus, an electronic device, and a storage medium, which can effectively improve the processing effect and efficiency of the existing three-dimensional object.

In one aspect, an embodiment of the present application provides a data processing method, where the method includes:

acquiring source data corresponding to the three-dimensional object;

based on the source data, acquiring first geometric features of a plurality of nodes corresponding to the source data, wherein the plurality of nodes comprise a plurality of levels of nodes with tree structures, the nodes of different levels correspond to different spaces in a three-dimensional space of the three-dimensional object, for any father node in the plurality of nodes, the first geometric feature of the father node is obtained based on the first structural feature and the first geometric feature of each child node of the node, and the first structural feature of one node represents the occupation condition of the surface structure of the three-dimensional object contained in the space corresponding to the node relative to the whole surface of the three-dimensional object;

determining a target node in the plurality of nodes based on the acquired first geometric characteristics of the plurality of nodes, and decoding the first geometric characteristics of the target node to obtain second geometric characteristics corresponding to each sub-node under the target node;

and processing the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding to obtain corresponding processing results.

Optionally, the source data may include at least one of data for identifying the three-dimensional object, data for constructing the three-dimensional object, or data for classifying the three-dimensional object.

Alternatively, the source data may include at least one of two-dimensional data or three-dimensional data.

Optionally, the source data includes at least one of the following:

point cloud data corresponding to the three-dimensional object; a two-dimensional image of a three-dimensional object; mesh data corresponding to the three-dimensional object.

Optionally, the processing of the three-dimensional object is performed based on the second geometric features corresponding to the nodes obtained by decoding, so as to obtain a corresponding processing result, where the processing result includes any one of the following:

constructing a three-dimensional image of the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding;

identifying the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding to obtain an identification result of the three-dimensional object;

and determining the category of the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding.

Wherein constructing the three-dimensional image of the three-dimensional object comprises reconstructing the three-dimensional image of the three-dimensional object or repairing the three-dimensional image of the three-dimensional object.

On the other hand, an embodiment of the present application provides a data processing apparatus, which includes a data acquisition module and a data processing module, wherein: the data acquisition module is used for acquiring source data of the three-dimensional object; the data processing module is used for executing the following operations:

based on the source data, acquiring first geometric characteristics of a plurality of nodes corresponding to the source data, wherein the plurality of nodes comprise a plurality of levels of nodes with tree structures, the nodes of different levels correspond to different spaces in a three-dimensional space of the three-dimensional object, for any father node in the plurality of nodes, the first geometric characteristics of the father node are obtained based on the first structural characteristics and the first geometric characteristics of each child node of the node, and the first structural characteristics of one node represent the occupation condition of the surface structure of the three-dimensional object contained in the space corresponding to the node relative to the whole surface of the three-dimensional object; determining a target node in the plurality of nodes based on the acquired first geometric characteristics of the plurality of nodes, and decoding the first geometric characteristics of the target node to obtain second geometric characteristics corresponding to each sub-node under the target node; and processing the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding to obtain corresponding processing results.

On the other hand, an embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program, and the processor executes the data processing method provided in any optional embodiment of the present application when running the computer program.

On the other hand, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the processor executes the data processing method provided in any optional embodiment of the present application.

In another aspect, an embodiment of the present application further provides a computer program product or a computer program, which when run on a computer device is the computer device that executes any one of the optional implementation methods provided in the present application. The computer program product or computer program comprises computer instructions, which are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the data processing method provided in any optional embodiment of the present application.

The technical scheme provided by the application brings the beneficial effects that: when the geometric three-dimensional representation (i.e., the second geometric feature) corresponding to the three-dimensional object is obtained based on the source data corresponding to the three-dimensional object, the structural information (i.e., the first structural feature) of the surface structure of the three-dimensional object is fused in the first geometric feature corresponding to each node used for obtaining the representation, and for a node, the structural feature of the surface in the space corresponding to the node can represent the importance (i.e., the occupation condition, such as the surface occupancy) of the surface to the whole surface of the three-dimensional object, so that only the important node in each node corresponding to the three-dimensional object can be further decoded based on the structural information, thereby effectively reducing the data processing amount while obtaining the geometric features of the nodes containing rich structural semantic information by decoding, and thus realizing efficient and accurate processing of the three-dimensional object according to the geometric features of each node obtained by decoding.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;

FIG. 2a is a schematic diagram illustrating a training phase of a model according to an embodiment of the present disclosure;

FIG. 2b is a schematic diagram illustrating a model application phase according to an embodiment of the present application;

FIG. 3 is a diagram illustrating an octree structure corresponding to a three-dimensional object provided in an example of the present application;

fig. 4 is a schematic diagram illustrating an operation principle of a hierarchical encoder and decoder according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an encoder structure according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating a decoder structure provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

For better understanding and description of the methods provided by the embodiments of the present application, some technical terms referred to in the embodiments of the present application will be described below:

an Octree structure, which may also be referred to as an Octree or Octree, is a tree-like data structure for describing a three-dimensional space, each node of the Octree representing a cubic volume element, each node having eight child nodes, the volume elements represented by the eight child nodes being added together to equal the volume of a parent node. If the octree is not an empty tree, there are exactly eight or zero children nodes of any node in the tree. For an object, the root node of the octree corresponds to the three-dimensional space surrounding the object, the child nodes of the root node correspond to one-eighth of the three-dimensional space, and the child nodes of the child nodes correspond to one-eighth of the space to which the child nodes correspond. One node in an octree can be called an Octant.

The three-dimensional object may be any real object, or may be a virtual object, such as an object (e.g., a table, a chair, an automobile, an airplane, etc.), an animal, a scene space, a game character in a game, a game prop, etc., without limitation in the embodiments of the present application.

Voxelization, which is the conversion of a geometric representation of an object (e.g. an object, a person, a part of a structure of a person, an animal, etc.) into a voxel representation closest to the object, produces a volume data set that contains not only surface information of the object but also internal properties of the object. Spatial voxels are similar to two-dimensional pixels representing images, but extend from two-dimensional points to three-dimensional cubes. For example, for a three-dimensional shape, a bounding box corresponding to the shape (the bounding box is the corresponding three-dimensional space) and a corresponding octree structure may be constructed, and the voxelization processing is implemented based on the octree structure, so as to obtain voxelized data corresponding to each node in the octree.

SDFs (directed distance fields) are used to compute whether a point in space is "inside" or outside the space described by a spatially distributed function (i.e., implicit function), i.e., the result of a settlement by an SDF characterizes whether a point is inside or outside a region.

Implicit function, also called implicit function, if the equation F (x, y) =0 can determine that y is a function of x, then the function represented in this way is called implicit function. In the field of three-dimensional modeling, three-dimensional modeling may be performed by an implicit surface reconstruction technique (i.e., a surface/surface structure in the embodiments of the present application), i.e., an implicit modeling method, which is a method for expressing a model space structure by defining an implicit function on a three-dimensional model space, and for an object (e.g., a three-dimensional shape), a geometric structure of a three-dimensional model of the object may be represented by the implicit function. An implicit function corresponding to a surface structure included in one space can be obtained by learning (training). An implicit function that is used to represent only a portion of the entire shape may be referred to as a local implicit function.

Features may also be referred to as feature vectors, features, feature representations/representations, implicit features, feature vectors, implicit vectors, and the like, and corresponding english may be referred to as late feature, late code, or late vector (also sometimes referred to as code for short), which are mathematical representations of elements (e.g., a point, an object, a space, and the like), and features corresponding to the elements may be learned through training or other learning manners.

In addition, the data processing method provided by the embodiment of the present application can be applied to various practical application scenarios, solves various practical technical problems, and has practical value. For example, based on the data processing scheme provided by the present application, a three-dimensional structure of an object may be reconstructed based on a two-dimensional image of the object (e.g., a depth image of the object), that is, image processing may be performed, a three-dimensional reconstruction of the object may also be implemented based on point cloud data of the object, that is, a shape construction of a three-dimensional object may be implemented, for example, the data processing scheme may be applied to smart city simulation, simulation construction of a three-dimensional object may be performed based on a two-dimensional image or point cloud data of an object in a smart city, for example, the data processing scheme may also be applied to a game scene, construction of a three-dimensional object in the game scene, repair of a three-dimensional object may also be implemented based on a part of three-dimensional data of the object (e.g., point cloud or mesh data), and identification of the object, classification of the object, and the like may also be implemented according to source data of the object (e.g., two-dimensional image or point cloud data, etc.).

The methods provided in the various alternative embodiments of the present application, the steps related to the feature extraction (including encoding or decoding of the features, etc.), may be implemented by using artificial intelligence techniques. The data processing (including data storage, data calculation, and the like) according to the embodiments of the present application may be implemented by using a cloud technology, for example, the data storage may be a cloud storage method, and the data calculation may be a cloud calculation method.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The scheme provided by the embodiment of the application can particularly relate to the technical field of computer vision in artificial intelligence. Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes technologies such as image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction and the like, and also includes common biometric technologies such as face recognition, fingerprint recognition and the like.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 shows a schematic flowchart of a data processing method provided in an embodiment of the present application, and as shown in fig. 1, the method may include the following steps:

step S110: acquiring source data corresponding to the three-dimensional object;

the three-dimensional object may be any solid object or virtual object, the source data corresponding to the three-dimensional object is data related to the shape structure of the three-dimensional object, and identification, classification, or construction (including reconstruction or completion, i.e., repair, etc.) of the three-dimensional object may be implemented based on the source data. The source data may be two-dimensional data and may be three-dimensional data.

Alternatively, the source data of the three-dimensional object may be at least one of point cloud data, a two-dimensional image, or mesh data of the three-dimensional object.

Two-dimensional data refers to data on a two-dimensional plane space, namely a data form under a two-dimensional coordinate system, and three-dimensional data refers to data in a three-dimensional space, namely a data form under a three-dimensional coordinate system. For example, the two-dimensional data may be a two-dimensional image, the specific data content may be pixel values, depth values, etc. of each point (two-dimensional coordinate point) in the image, the three-dimensional data may be point cloud data, such as color information and reflection intensity information of each point (three-dimensional coordinate point) in a three-dimensional space, and the information content of the three-dimensional coordinate point contained in the point cloud data may be different for different data acquisition manners. The mesh data refers to data defined on a mesh, that is, data obtained by performing a meshing process on initial data, and in computer graphics, a mesh is a basic representation manner, for example, point cloud data may be subjected to a meshing process to obtain corresponding mesh data.

Step S120: acquiring first geometric characteristics of a plurality of nodes corresponding to source data based on the source data of the three-dimensional object;

the plurality of nodes include nodes of a plurality of hierarchies in a tree structure, for a parent node in the plurality of nodes, the first geometric feature of the parent node is obtained based on the first structural feature and the first geometric feature of each child node of the parent node, and the first structural feature of one node represents an occupation situation of a surface structure of a three-dimensional object included in a space corresponding to the node with respect to an overall surface of the three-dimensional object (which may also be understood as an importance degree of a surface included in the space corresponding to the node with respect to the three-dimensional object). Optionally, the first structural feature may further characterize the geometric complexity of the surface structure contained in the space corresponding to the node.

For any node, the first geometric feature of the node characterizes the geometric shape contained in the space corresponding to the node, that is, the shape feature of the shape (that is, the surface structure) contained therein. In the solution provided in the embodiment of the present application, a coding manner from bottom to top may be used to obtain the first geometric features of nodes in each hierarchy, specifically, the first geometric features of nodes where child nodes do not exist may be obtained first, and for a parent node where child nodes exist, the first geometric features of the parent node may be obtained by fusing the first geometric features and the first structural features of the child nodes, and optionally, feature representations (including the first geometric features and the first structural features) of all child nodes of the parent node may be spliced, and feature extraction is performed on the spliced features, so as to obtain the first geometric features of the parent node. Therefore, the first geometric feature obtained based on the method is fused with the structural information and the geometric shape information of the surface structure contained in the space, and contains richer structural semantic information, so that the representation information (namely, the second geometric feature in the following text) corresponding to the three-dimensional object can be obtained accurately based on the feature, and better basic support is provided for improving the processing result of the three-dimensional object.

In the embodiment of the present application, the nodes having multiple hierarchies of the tree structure may be interpreted as feature vectors that imply (parent-child relationships) different hierarchical relationships among multiple feature vectors (i.e., feature representations) extracted based on the source data.

For better understanding, the nodes of multiple levels having a tree structure are described below with reference to an example. For any three-dimensional object, a cube surrounding the three-dimensional object may be constructed, the cube may be understood as a three-dimensional space of the three-dimensional object, that is, a three-dimensional space occupied by the three-dimensional object, a corresponding octree structure may be constructed based on the cube, an initial cube surrounding the three-dimensional object may correspond to a root node (in the embodiment of the present application, the root node is referred to as a node of the highest hierarchy) of a tree structure (that is, an octree), each child node (a node of a next hierarchy of the root node) of the root node corresponds to one of eight subspaces obtained by dividing the initial cube, and similarly, a node of a next hierarchy of each child node is a subspace obtained by dividing a space corresponding to the child node again. Correspondingly, the feature representation corresponding to each child node in the octree structure is the feature vector of the space corresponding to the child node. It is understood that, in practical applications, a space corresponding to one sub-node may or may not include part or all of the surface structure of the three-dimensional object (i.e., the surface of the three-dimensional object).

Optionally, for any node, the first structural feature corresponding to the node characterizes importance of a surface structure of a three-dimensional object contained in a space corresponding to the node on the three-dimensional object and complexity of a geometric shape of the surface structure contained in the space corresponding to the node. The importance may be understood as whether the surface of the three-dimensional object is included in the space corresponding to the node, and if so, the importance degree of the included surface to the three-dimensional object, and optionally, the importance degree may be expressed by a surface occupancy rate, that is, a proportion of the surface included in the space corresponding to the node to the entire surface of the three-dimensional object, for example, an area occupancy rate, and the higher the occupancy rate is, the more important the importance is. The geometric complexity of the surface structure, i.e. the complexity of the surface shape (i.e. richness of the geometric shape, which may be referred to as geometric richness for short), may also be understood as the smoothness of the surface, the smoother the surface the lower the complexity. Alternatively, the complexity of the surface shape may be expressed in terms of the amount of variation from the surface normal, with greater amounts indicating greater complexity. The first geometric feature corresponding to a node is a feature vector representing the geometric shape of the surface structure included in the space corresponding to the node.

In the embodiment of the present application, the step of obtaining, based on source data of a three-dimensional object, first geometric features of nodes having multiple levels of a tree structure corresponding to the source data may be implemented by a neural network model (referred to as a feature mapping/extraction model for short) obtained through training, and the source data of the three-dimensional object is input into the trained model to obtain the first geometric features of the nodes of the corresponding levels.

The specific network structure of the feature mapping model is not limited in the embodiment of the present application. As an alternative, the feature mapping model may be a model based on a VAE (Variational Auto-Encoder) principle, and the VAE may be used as a last hidden layer of the feature mapping model, and the first geometric feature meeting the requirement is obtained based on a normal distribution parameter output by the VAE.

Step S130: determining a target node in the plurality of nodes based on the acquired first geometric characteristics of the plurality of nodes, and decoding the characteristic representation of the target node to obtain second geometric characteristics corresponding to each sub-node under the target node;

the target node includes a root node in the plurality of nodes and a node satisfying a set condition in the plurality of nodes, where the node satisfying the set condition means that a surface structure included in a space corresponding to the node satisfies the set condition, and optionally, the node satisfying the set condition may include that the space corresponding to the node includes the surface structure, or that the space includes the surface structure and the complexity of the geometric shape of the surface structure is greater than or equal to a complexity threshold (or the complexity is a specified value). Whether a node satisfies a set condition may be determined based on a second structural feature corresponding to the node, where the second structural feature corresponding to the node may be determined based on a first geometric feature of a parent node of the node, and may be specifically obtained by decoding the first geometric feature of the parent node of the node (which will be described in conjunction with specific embodiments and will not be described here). It should be noted that the meaning of the second structural feature is the same as that of the first structural feature, except that the first structural feature is an initialized feature, and the second structural feature is a result of further processing based on the first geometric feature of the node.

Since most 3D shapes (an example of 3D objects) are usually composed of large smooth areas and large-scale sharp features, their surfaces usually occupy only a small portion of the entire space (i.e., the three-dimensional space mentioned above), and the space occupation is extremely sparse. If the space does not contain the surface or the contained surface has small importance (for example, the surface occupancy is small), the space can not be reprocessed in the subsequent process (for example, the structure in the space is reconstructed), so that the data processing amount is reduced, the processing efficiency is improved, and the processing result is not influenced basically.

In view of the above factors, in the scheme provided in the embodiment of the present application, for the plurality of nodes, because the first geometric features of the obtained nodes are fused with the structural features, and the structural features can represent the importance of the surface structure in the corresponding space (reflect surface sparsity), optionally, the structural features can also represent the geometric richness of the surface structure, therefore, on the basis of the obtained first geometric features of each node, on the premise of simultaneously considering the surface richness and sparsity corresponding to the node, the first geometric features are decoded, and the target nodes that need to be subsequently and continuously decoded are selected through the decoding process, that is, the important target nodes are determined by fully utilizing the 3D-shaped structural semantics, so that only the target nodes containing the rich structural semantics need to be subsequently processed, thereby effectively reducing the data amount that needs to be processed, greatly improving the data processing efficiency, and being capable of better meeting the actual application requirements.

For example, in an application of implementing three-dimensional image reconstruction of an object based on the scheme provided by the embodiment of the present application, a semantic structure of a 3D shape may be fully utilized to provide useful guidance for surface modeling, processing may be performed to different degrees according to the complexity of a geometric shape enclosed in a space, and the accuracy of 3D modeling may be ensured while reducing the occupation of computational resources (such as memory).

Step S140: and processing the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding to obtain corresponding processing results.

According to the scheme provided by the embodiment of the application, when the geometric three-dimensional representation corresponding to the three-dimensional object (i.e. the second geometric feature corresponding to each target object) is obtained based on the source data corresponding to the three-dimensional object, the structural information (i.e. the first structural feature) and the shape information (i.e. the first geometric feature) of the surface structure of the three-dimensional object are considered at the same time, for one node, the structural feature of the surface in the corresponding space can represent the importance (such as surface occupancy rate) of the surface on the whole surface of the three-dimensional object, so that only the important node in each node corresponding to the three-dimensional object can be further decoded based on the two information, and the efficient and accurate processing of the three-dimensional object is realized according to the geometric feature of each node obtained by decoding.

In the embodiment of the present application, the processing of the three-dimensional object may be any processing that needs to be performed based on the feature representation (i.e., the second geometric feature) of the three-dimensional object. Optionally, the processing of the three-dimensional object based on the second geometric features corresponding to the nodes obtained by decoding to obtain corresponding processing results includes any one of the following:

Wherein constructing a three-dimensional image of a three-dimensional object may comprise reconstructing the three-dimensional object or repairing the three-dimensional object (i.e. shape completion of the three-dimensional object). For example, reconstruction of a three-dimensional image of an object may be achieved based on a two-dimensional image of the object (e.g., a two-dimensional image including depth information and color information), or shape completion of a three-dimensional object may be achieved from partial three-dimensional data of the object (e.g., point cloud data, mesh data, etc.). The realization of the identification, classification, and the like of the object based on the feature representation is also a common practical application, for example, the identification of the object can be realized according to part or all of the point cloud data of the object, and the description thereof is not repeated.

In an optional embodiment of the present application, constructing a three-dimensional image of the three-dimensional object based on the second geometric features corresponding to the nodes obtained by decoding may specifically include:

acquiring a query point data set, wherein the query point data set comprises three-dimensional position information of a plurality of points to be queried;

determining the position relation of the to-be-queried point and the surface structure contained in the space corresponding to each decoded node based on the decoded second geometric characteristics corresponding to each node and the three-dimensional position information of each to-be-queried point;

and constructing a three-dimensional image of the three-dimensional object based on the determined position relation of each point to be queried.

The query point may be any point in the space, and may be acquired by a sampling manner, for example, uniform sampling may be performed in a specified space range, and three-dimensional position information of each sampling point is recorded, and the sampling points may be used as the query point, and the three-dimensional position information (which may be referred to as a 3D query position) of the query point may be a three-dimensional world coordinate, or a position coordinate in the space where the point is located, or a three-dimensional coordinate after some preprocessing, such as a three-dimensional coordinate after normalization processing is performed on the three-dimensional world coordinate. For any point to be queried and a node, based on the second geometric feature corresponding to the node and the 3D query position of the query point, a position relationship between the query point and the surface structure in the space corresponding to the node is determined, where the position relationship may be a distance between the query point and the surface structure, such as an SDF distance, and based on the position relationship, it may be further known whether the query point is a point constituting the surface (i.e., a point located on the surface), or is located in front of the surface (front side) or behind the surface (rear side), so that the construction of the 3D surface structure of the three-dimensional object may be implemented according to the position relationship between each point to be queried and the surface structure corresponding to each node.

Optionally, for any point to be queried, the geometric feature corresponding to a node and the 3D query position of the query point may be spliced and input into a trained decoding model, and the position relationship between the query point and the surface corresponding to the target node is obtained through prediction by the decoding model, where the model output may be a feature value (also may be referred to as an SDF value) representing an SDF distance corresponding to the query point, for example, a value range of the feature value may be a real number from 0 to 1, if a value of a certain query point is less than 0.5, it may be stated that the query point is a point located in front of the surface corresponding to the target node, if the feature value is greater than 0.5, it is stated that the query point is a point located behind the surface corresponding to the target node, if the feature value is equal to 0.5, it is stated that the query point is a point on the surface, after the feature value corresponding to each query point is determined, a 3D query position and a corresponding feature value of the query point are obtained through network conversion (for example, these feature values are input into a mesh conversion model), and a 3D structure corresponding to be reconstructed, or reconstructed. Of course, the feature value of the SDF distance may also be expressed in other manners, for example, for a query point, a positive value of the feature value of the SDF distance corresponding to the query point indicates that the query point is a point in front of the surface, a negative value of the feature value indicates that the query point is a point in back of the surface, and a 0 of the feature value indicates that the query point is a point on the surface.

In an optional embodiment of the application, the determining a target node in the multiple nodes based on the obtained first geometric features of the multiple nodes may include:

according to the sequence of the hierarchy from high to low, decoding the first geometric features of the father node in the nodes to obtain second structural features and second geometric features of a plurality of child nodes under the father node, and determining a target node in the child nodes based on the second structural features of each child node.

Optionally, in order to better determine a target node in the plurality of nodes corresponding to the three-dimensional object, that is, a node corresponding to a space that needs to contain a rich geometric shape, the target node may be determined by decoding, according to an order from high to low in a hierarchy in the tree structure (where a root node is a node at a highest hierarchy), the feature representation of a parent node in the tree structure by using a recursive decoding manner, obtaining second structural features of a plurality of child nodes under the parent node by decoding, and determining whether a node that needs to be further decoded exists in the plurality of child nodes according to the structural features (it may be understood that the target node that needs to be further decoded refers to the feature representation of the node, that is, the space corresponding to the node needs to be further subdivided), and if so, continuing to repeat the decoding determination manner for the target node until it is determined that no node that needs to be further decoded exists. And the determined target node comprises a root node.

Optionally, the determining a target node in the multiple nodes based on the obtained first geometric features of the multiple nodes, and decoding the first geometric features of the target node to obtain second geometric features corresponding to each child node under the target node, is implemented by taking a root node in the multiple nodes as an initial node to be processed according to a sequence from high to low of a hierarchy corresponding to the multiple nodes, and repeatedly performing the following operations:

decoding to obtain the implicit characteristics of each child node of the node based on the first geometric characteristics of the current node to be processed;

for each child node, decoding to obtain a second structural feature corresponding to the child node based on the implicit feature corresponding to the child node;

determining a target child node in each child node based on the second structural characteristics of each child node;

decoding to obtain a second geometric feature corresponding to each child node based on the implicit feature of each child node;

respectively taking each target child node as a new node to be processed;

and determining target nodes in the plurality of nodes, wherein the target nodes comprise root nodes and target child nodes.

The decoding method for determining the target node and the node corresponding thereto provided in the optional embodiment of the present application may be implemented by a trained neural network model (i.e., a decoding model/decoder), and since a hierarchical decoding method is adopted, the adopted model may also be referred to as a hierarchical decoder, and the decoding method may also be referred to as hierarchical feature decoding.

By adopting the decoding method from top to bottom provided by the embodiment of the application, which nodes need to be further decoded can be determined step by step according to the second structural characteristics of the decoded nodes, and because the structural characteristics can reflect the importance (namely the surface occupancy rate) of the surface structure contained in the space corresponding to the node on the three-dimensional object and the complexity of the surface geometry of the surface structure, if it is determined that one node needs to be further decoded, it can be shown that the space corresponding to the node contains surfaces and the surface geometry is rich, and the space corresponding to the node needs to be further subdivided. By adopting the method, only the node which needs to be further decoded in the plurality of nodes can be decoded continuously, and the node which does not need to be further decoded and the child node of the node do not need to occupy the equipment resource for decoding continuously, so that the data processing amount can be effectively reduced, and the equipment resource is saved.

In order to better understand the manner, the root node is taken as an example of the node to be processed, and the manner is further explained. Specifically, the first geometric feature of the root node is decoded to obtain the implicit features of each child node of the node (i.e., the implicit features of each child node under the parent node, i.e., the hidden vectors, are obtained by decoding the parent node), and then the second structural features of each child node can be obtained by further processing (i.e., continuously decoding) according to the hidden vectors of each child node, for example, the hidden vector of each child node can be input into a trained classifier and corresponds to one child node, and the output feature of the classifier represents whether the space corresponding to the child node needs to be further subdivided, that is, the output feature of the classifier can be used as the second structural feature corresponding to the child node, so that whether the child node is the target node can be determined according to the structural features, and if the output feature of the classifier represents that the space corresponding to the child node needs to be further subdivided, the child node is the target node, otherwise, the child node is not the target node (i.e., the child node and each node under the child node do not need to be decoded again). And after the target node in the child nodes of the root node is determined, taking the target node as a node to be processed in the same way, and repeating the process until no node needing to be subdivided exists. And the second geometric characteristic of each child node can be obtained by further decoding based on the implicit characteristics of the child nodes.

As can be seen from the above examples, by using the decoding method provided by the present application, it can be determined whether to perform further decoding processing on the node based on the structural features of the node, and if not, the child nodes under the node and the child nodes of the child nodes do not need to perform decoding processing, and because the space corresponding to the non-target node does not include the surface structure of the three-dimensional object, or the surface structure included in the space does not have importance for the three-dimensional object (i.e., the surface occupancy is very low), the processing efficiency can be obviously improved without affecting the processing effect, and the occupation of the computing resources is reduced. It is understood that the second geometric feature of each node obtained by decoding includes the second geometric feature of the child node (the child node may be the target node, and may not be the target node) of each target node.

In an optional embodiment of the present application, the second structural feature includes a first feature value, where the first feature value represents that the occupation situation corresponding to the node is a surface structure that includes a three-dimensional object or a surface structure that does not include the three-dimensional object in a space corresponding to the node; correspondingly, the determining the target child node in each child node based on the second structural feature of each child node includes:

and for any child node, if the surface structure of the three-dimensional object in the space corresponding to the node is determined based on the first characteristic value of the child node, taking the child node as a target child node.

In an optional embodiment of the application, the second structural feature includes a first feature value and a second feature value, the first feature value characterizes the occupation situation corresponding to the node, the occupation situation is whether the space corresponding to the node includes a surface structure of the three-dimensional object, that is, the space corresponding to the node includes the surface structure of the three-dimensional object or does not include the surface structure of the three-dimensional object, and the second feature value characterizes a geometric complexity of the surface structure of the three-dimensional object included in the space corresponding to the node.

Determining a target child node in each child node based on the second structural feature of each child node, including:

and for a child node, if the surface structure comprising the three-dimensional object in the space corresponding to the node is determined based on the first characteristic value corresponding to the child node, and the space corresponding to the geometric shape complexity characteristic node of the surface structure is determined to need to be subdivided based on the second characteristic value corresponding to the child node, determining the child node as a target child node.

As can be seen from the foregoing description, in an alternative embodiment, the structural features (the first structural feature, the second structural feature) reflect two aspects of information of the space corresponding to the node, one is the occupancy rate corresponding to the surface included in the space, and one is the complexity level of the geometric shape corresponding to the surface, so that the structural features can be represented by two feature values, one is the occupancy rate and one is the complexity level of the geometric shape. Optionally, the first feature value may be a feature value representing whether a space corresponding to the node includes a surface structure of the three-dimensional object, that is, the surface structure of the three-dimensional object is included or not included, that is, the first feature value may be a feature value corresponding to two categories, for example, if the feature value is greater than or equal to a certain set value, it is determined that the space corresponding to the node includes the surface structure, that is, a part of the space corresponding to the node that encloses the surface of the three-dimensional object is included. The second feature value may also be a feature value corresponding to a binary classification, where the value characterizes that the complexity of the surface structure corresponding to the node is sufficiently complex, and the corresponding space needs to be further subdivided (that is, the geometry of the surface enclosed in the space of the node needs to be further subdivided).

For a node, if the space corresponding to the node needs to be further subdivided according to the first characteristic value and the second characteristic value corresponding to the node, the node is a target node. The second geometric feature corresponding to the child node can be obtained by further decoding based on the implicit feature of the child node.

In an optional embodiment of the present application, a target node in a plurality of nodes is determined based on the obtained first geometric features of the plurality of nodes, and obtaining second geometric features corresponding to each child node under the target node by decoding the first geometric features of the target node is implemented by a decoding model, where the decoding model is obtained by training in the following manner:

acquiring a training data set, wherein each sample of the training data set comprises a three-dimensional shape and a corresponding training label, and the training labels correspond to the processing mode of the three-dimensional object;

constructing an octree structure corresponding to each three-dimensional shape, and acquiring node data corresponding to each sample node of the octree structure, wherein for each sample node, the node data comprises voxelized data of a space corresponding to the sample node and first structural characteristics corresponding to the sample node;

training a neural network model based on node data and training labels corresponding to each sample, wherein the neural network model comprises a coding module (namely an encoder) and a decoding module (namely a decoder) which are cascaded, the decoding module comprises a first decoding module, and the trained decoding module is used as the decoding model;

for one sample, the input of the coding module comprises node data corresponding to the sample, the output of the coding module is a first geometric characteristic corresponding to each sample node, the input of the first decoding module comprises the first geometric characteristic of each sample node in an octree structure, and the output of the first decoding module comprises a second geometric characteristic corresponding to each sample node obtained through decoding.

Optionally, for a sample node, the first structural feature corresponding to the sample node may be determined by calculating a normal variation of a surface contained in a space corresponding to the node, the structural feature may include two feature values, if the space corresponding to the node includes a three-dimensional shape surface, the first feature value may be set to a first value (e.g., 1), if the space corresponding to the node does not include a three-dimensional shape surface, the first feature value may be set to a second value (e.g., 0), further, if the space corresponding to the node includes a three-dimensional shape surface, the second feature value may be determined to be a third value or a fourth value according to the normal variation of the surface, for example, if the normal variation is greater than or equal to a set threshold, the second feature value is a third value (e.g., 1), otherwise, the second value is a fourth value (e.g., 0), and if the space does not include a three-dimensional shape surface, the second feature value is a fourth value. By this alternative, the binary flag of the initialization of the surface structure in the space corresponding to the node, i.e. the first structural feature, can be determined.

For each sample, the training label refers to a real result corresponding to the sample, and it can be understood that, for different processing requirements, the training labels of the samples can be set according to actual requirements, and the labels of different required samples may be the same or different. For example, the geometric feature output by the decoding model is used for reconstructing a three-dimensional object, during training, the training label may include a label for representing a real surface structure of the three-dimensional shape, a prediction result for representing the surface structure of the three-dimensional shape may be obtained based on the geometric feature output by the decoding module, a reconstruction loss (also referred to as a geometric loss) corresponding to the three-dimensional shape may be calculated according to the label and the prediction result, that is, a loss of a difference between the real result and the prediction result of the surface structure representing the three-dimensional shape, a model parameter of the neural network model may be adjusted based on the training loss, and the training process may be repeated until a training end condition is satisfied.

Of course, during training, other training labels and prediction results corresponding to the labels can be used for calculating other associated losses so as to improve the performance of the model. For example, the training loss may further include a structure loss, a subdivision loss, and the like, where the structure loss and the subdivision loss characterize a difference between a true structure feature (a first structure feature) of each target node corresponding to the sample and a second structure feature of each target node predicted by the decoding module in the training process.

It should be noted that, in practical applications, if the source data corresponding to the three-dimensional object is also convertible into an input that meets the input format requirement of the encoding module, the trained encoding module may also be applied to the processing of the source data, that is, data that meets the input requirement of the encoding module may be obtained first based on the source data corresponding to the three-dimensional object, and then the data is input to the trained encoding module, so as to obtain a feature representation of a plurality of levels of nodes having a tree structure corresponding to the source data, where the tree structure is an octree structure, and the nodes of the plurality of levels are each node of each level in the octree structure. And then, inputting the output of the coding module to a trained decoding module to obtain a second geometric characteristic corresponding to a target node in a plurality of nodes in the octree structure.

In an optional embodiment of the present application, the decoding module further includes a second decoding module cascaded with the first decoding module, and during training, the input of the second decoding module includes the output of the first decoding module and three-dimensional position information of the plurality of sample query points, and the output is a position relationship of a surface structure included in a space corresponding to each node obtained by decoding and the plurality of sample query points;

correspondingly, the processing of the three-dimensional object based on the second geometric features corresponding to the nodes obtained by decoding to obtain corresponding processing results includes:

acquiring query point data corresponding to each node obtained based on decoding, wherein the query point data comprises three-dimensional position information of a plurality of points to be queried;

for each node obtained through decoding, determining the position relation of each point to be queried and a surface structure contained in a space corresponding to the node through a trained second decoding module based on a second geometric characteristic corresponding to the node and the three-dimensional position information of each point to be queried;

and constructing a three-dimensional image of the three-dimensional object based on the determined position relation of the points to be inquired.

When the actual application scene is a three-dimensional object construction scene, in order to obtain the position relationship between the query point in the space and the surface structure of the three-dimensional object, the decoding part of the neural network model can also comprise a second decoding module cascaded with the first decoding module besides a first decoding module for decoding to obtain the second geometric characteristics corresponding to each node, and the position relationship between the query point and the surface structure contained in the space corresponding to the node can be obtained by inputting the second geometric characteristics corresponding to the node decoded by the first decoding module and the three-dimensional position information of the query point into the second decoding module, so that the construction of the three-dimensional object can be realized according to the position relationship corresponding to a plurality of query points. Optionally, in the application scenario, the training labels of the samples may include real position relationships corresponding to the multiple sample query points, and the corresponding geometric loss is obtained by calculating a difference between the real position relationships corresponding to the multiple sample query points and the predicted position relationship output by the decoding portion of the neural network model.

In an alternative embodiment of the present application, constructing an octree structure corresponding to each three-dimensional shape includes:

taking a node corresponding to a three-dimensional space surrounding the three-dimensional shape as a root node;

taking the root node as an initial node to be divided, repeatedly executing the following operations on the node to be divided until the node to be divided does not exist, and obtaining an octree structure corresponding to the three-dimensional shape on the basis of the obtained nodes:

dividing the space corresponding to the node to be divided into eight subspaces, wherein each node corresponding to the eight subspaces is the eight child nodes of the node;

for each child node, determining a first structural feature corresponding to the child node;

determining sub-nodes to be divided in each sub-node based on the first structural characteristics corresponding to each sub-node;

and taking the determined sub-nodes to be divided as new nodes to be divided.

This optional embodiment of the present application provides a construction method of an octree structure, and when constructing an octree structure corresponding to a three-dimensional shape, the method considers two levels of information, namely, the occupancy rate of a surface of the three-dimensional shape contained in a space corresponding to a node and the geometric complexity of the surface, that is, the initial structural feature, so that space subdivision can be performed again only on important nodes whose two levels of information satisfy a certain condition, that is, it is determined whether to divide the space corresponding to the node again according to the first structural feature of the node.

Optionally, for each child node, determining a first structural feature corresponding to the child node includes:

if the space corresponding to the child node comprises a three-dimensional surface structure, determining a first characteristic value of a first characteristic of the child node as a first value, otherwise, determining the first characteristic value as a second value;

if the space corresponding to the sub-node contains the three-dimensional surface structure, determining the geometric shape complexity of the surface structure, and determining a second characteristic value of a second characteristic corresponding to the node according to the geometric shape complexity, wherein the second characteristic value is a third value or a fourth value;

if the space corresponding to the sub-node does not contain the three-dimensional surface structure, determining a second characteristic value of a second characteristic corresponding to the sub-node as a fourth value;

and the initial structure characteristic corresponding to the node comprises a first characteristic value and a second characteristic value.

Optionally, the determining, based on the initial structural features corresponding to the child nodes, the child nodes to be divided in the child nodes includes:

and determining the nodes of which the first characteristic value is the first value and the second characteristic value is the third value in the first structural characteristics corresponding to each sub-node as the sub-nodes to be divided.

Based on the alternative embodiment, the nodes needing further division can be determined quickly and accurately. Optionally, the values of the first value and the third value may be 1, and the values of the second value and the fourth value may be 0.

As can be seen from the foregoing description, as another alternative, the first structural feature may not include the second feature value, and accordingly, for any child node, if the first feature value of the node is the first value, the child node is determined to be the child node to be divided.

As an alternative, for any node, determining the geometric complexity of the surface structure corresponding to the node may include:

determining the normal variation of the surface structure; geometry complexity is determined based on the normal variance.

Specifically, for a surface structure in a space, a plurality of sampling points may be obtained by sampling on the surface structure, and a variation (referred to as a normal variation for short) of the plurality of sampling points with respect to a normal direction of the surface structure is calculated, and the normal variation of the surface structure may be obtained based on the normal variations corresponding to the sampling points (for example, taking an average of variable normal vectors corresponding to the sampling points). Optionally, if the normal variation of the surface structure is greater than or equal to the set threshold, the value of the second characteristic value of the corresponding geometric complexity may be a third value, and otherwise, the value is a fourth value.

In an optional embodiment of the present application, for any sample, the encoding module obtains a feature representation corresponding to each sample node of the sample by encoding in the following manner:

and sequentially coding each sample node in the octree structure according to the sequence of the octree structure corresponding to the sample from bottom to top to obtain a first geometric characteristic corresponding to each sample node, wherein for the sample node without subnodes in the octree structure, the first geometric characteristic of the sample node is obtained by coding the voxelized data corresponding to the sample node through a coding module, and for the sample node with subnodes, the first geometric characteristic of the sample node is obtained by coding the first geometric characteristic and the first structural characteristic of each subsample node of the sample node through the coding module.

The optional embodiment of the present application provides a hierarchical coding method, and specifically, the coding process starts from a node with the finest granularity of an octree structure (i.e., a node without a child node), and codes each node of each hierarchy in each octree structure by using a coding method from bottom to top, specifically, node data of the octree node with the finest granularity (i.e., voxel data obtained after a structure in a space corresponding to the node is subjected to voxel processing) may be first input into a coder corresponding to the hierarchy to obtain a first geometric feature of each node of the hierarchy, and for each octree node (i.e., a parent node) except for the finest granularity, the first geometric feature of the node may be obtained by fusing the first geometric features of each child node of the node by using the coder.

In an optional embodiment of the application, for a sample, a training label includes a first label and a second label, the first label represents a real position relationship corresponding to each of a plurality of sample query points, the second label represents a real structural feature corresponding to each sample node in an octree structure corresponding to the sample, and an output of a first decoding module further includes a predicted structural feature corresponding to each sample node;

the total loss function of the neural network model comprises a first loss function and a second loss function, wherein for a sample, the value of the first loss function represents the difference between the real position relationship and the predicted position relationship corresponding to each sample query point in the octree structure corresponding to the sample, and the value of the second loss function represents the difference between the real structure feature and the predicted structure feature (i.e. the second structure feature corresponding to the sample node) corresponding to each sample node corresponding to the sample.

In an optional embodiment of the present application, the encoding module includes a cascaded feature extraction module and a VAE, a geometric feature corresponding to a node output by the encoding module is determined based on a normal distribution of the VAE, the total loss function further includes a third loss function, and for a sample, a value of the third loss function represents a difference between the normal distribution of the sample corresponding to the VAE and a reference normal distribution.

By adopting the encoding structure containing the VAE, the situation that the distribution of the potential space is restricted by utilizing a VAE re-parameterization technology to adapt to normal distribution can be realized, so that the feature representation obtained by encoding can contain more spatial semantic information. The reference normal distribution may be a standard normal distribution with an expected value of 0 and a standard deviation of 1, and the normal distribution output by the VAE may be constrained to approximate the standard normal distribution based on the third loss function.

The solution provided by the embodiments of the present application can be applied to various tasks including, but not limited to, 3D shape reconstruction, shape completion with partial and noisy inputs, image-based 3D reconstruction, and the like. The acquisition mode of the geometric features of the three-dimensional object provided by the embodiment of the present application may be abbreviated as Octree Field, and is a learnable hierarchical implicit representation for a 3D surface (i.e., a 3D surface/surface structure), which allows a complex surface to be encoded with high accuracy with low memory and computational budget. A hierarchical octree structure is introduced to adaptively subdivide a 3D space according to surface occupancy and richness of geometry (i.e., geometry-to-complexity of the surface structure). Since octrees are discrete and non-differentiable, the present application further proposes in the presented alternatives a new hierarchical network (encoding and decoding modules described in the foregoing) that recursively encodes and decodes octree structures and surface geometries in a differentiable way. The scheme provided by the application combines the advantages of the local implicit expression and the hierarchical data structure, and associates the local implicit function with each octree node unit, so that the representation mode corresponding to the node can use a small amount of memory overhead to carry out three-dimensional modeling on a large-scale scene with fine geometric details. The method provided by the embodiment of the application can greatly reduce the calculation cost, so that the inference speed in actual use can be increased.

For a better understanding and an explanation of the solutions provided in the present application, the solutions provided in the present application are further described below with reference to alternative embodiments in a specific application. In this embodiment, an application scene reconstructed by a three-dimensional object is taken as an example, the three-dimensional object may be a three-dimensional object in a game scene, for example, any three-dimensional object, a character, and the like in the game scene, for example, a battle game is taken as an example, the three-dimensional object may be a virtual character corresponding to a player, may also be an NPC (non-player character), and may also be a virtual object therein (such as a table, a chair, a car, or other virtual items, and the like).

The scheme provided by the embodiment of the application can be realized by utilizing a neural network model in the field of artificial intelligence, and can be divided into a training stage and a test/application stage of the neural network model. Phase (the principle of the application phase and the test phase is the same). Fig. 2a shows a structural schematic diagram of a neural network model in a training stage, fig. 2b shows a schematic diagram of a process of performing 3D reconstruction based on a 3D object reconstruction model obtained by training in a testing stage, and reconstruction of a 3D image of a three-dimensional object can be achieved based on source data corresponding to the object through the process shown in fig. 2 b. As shown in fig. 2a, for the training phase, the structure of the neural network model includes a hierarchical encoder and a hierarchical decoder (i.e., the hierarchical encoder-decoder shown in the figure), where the hierarchical encoder is an encoding module in the foregoing, and the hierarchical decoder is a decoding module in the foregoing, the hierarchical encoder-decoder may be continuously trained based on a large number of training samples to obtain a trained decoder (i.e., a training-completed local decoder shown in the figure), and the trained decoder may be used as a decoding model in the testing and application phase, i.e., a model structure for implementing the hierarchical feature decoding step 12 shown in fig. 2 b. The scheme of the present embodiment will be described with reference to fig. 2a and 2 b.

During training, an adaptive octree unit (octree structure) is first constructed for the 3D shapes (i.e., samples) in the training set, i.e., the octree construction shown in FIG. 2 a. Then, training of the encoder-decoder is performed using a hierarchical encoder-decoder network, thereby generating a trained local decoder (i.e., a trained decoder) deployed under test. In testing, a partial 3D input (e.g., a partial point cloud or a mesh) or a 2D image may be used as an input, and features extracted from the input are first mapped to a potential space of a trained local decoder, that is, input source data is mapped to a feature space corresponding to the decoder through feature mapping, that is, the input data is converted into input data meeting requirements of the decoder through feature mapping. The feature mapping step may be implemented by using a feature extraction network obtained through training, the embodiment of the present application is not limited to a specific network structure of the feature extraction network, and an existing feature extraction network may be used as long as the output of the feature extraction network is feature data corresponding to a feature space of a decoder through training. A hierarchical feature decoder may then be performed to obtain the constructed implicit field (i.e., the second geometric feature in the foregoing), and the output 3D mesh may be reconstructed by applying the implicit field to the mesh transform (implicit field-to-mesh transform step shown in the figure). Since the feature mapping and the implicit field-to-grid conversion can be implemented based on the prior art, and the implementation is clear to those skilled in the art, the two parts are not explained in the present embodiment, and the detailed description is mainly given below for the novel implicit representation based on octree (i.e. octree construction), the hierarchical encoder-decoder network, and the corresponding training process.

1. About octree construction

For each training sample, i.e., three-dimensional shape, in the training dataset, an octree structure (which may be referred to as an octree) for each three-dimensional shape needs to be constructed. To construct an octree of the input model, the three-dimensional shape can first be scaled to an axis-aligned bounding box and then the bounding regions recursively subdivided in breadth-first order. Subdivided octree nodes (i.e., nodes that need further subdivision) need to satisfy two requirements simultaneously:

(1) The nodes enclose the surface, namely the surface which comprises a three-dimensional shape in the space corresponding to the nodes;

(2) The geometry of the node closure has sufficient complexity to be subdivided, i.e. the geometry complexity of the surface of the node closure meets certain requirements.

Alternatively, the normal variation of the surface S may be used as an indicator of the complexity of the geometry, and for one surface S, the normal variation V (S) of the surface S may be calculated by the following formula:

wherein, the first and the second end of the pipe are connected with each other,

normal vectors n respectively representing the ith sample point on the surface S ⁱ The components in the x, y and z directions,

respectively representing the sum of the amounts of change in the x, y and z directions of the normal vector of all the sample points on the surface S, E _i Which represents the mean value of the three expected values.

Alternatively, the surface S may be sampled at pre-calculated sampling points. When constructing the octree, the nodes can be repeatedly subdivided according to the two requirements until the depth of the octree reaches the preset depth d, or the normal variation V (S) of the nodes is less than or equal to the set value tau, that is, V (S) is less than or equal to the set value tau, which indicates that the geometric complexity of the surface does not meet the requirement, and the octree can not be subdivided. The set value τ may be set according to an empirical value or an experimental value, and optionally τ =0.1.

A schematic two-dimensional structure of a "rabbit" shape as shown in the left part of fig. 4, wherein O _i And O _j Is a two-dimensional representation of the space corresponding to two octree nodes of the same level, O _i And O _j All are regions of the same size as the region shown in A1, V (O) in the figure _i )、V(O _j ) Each represents O _i And O _j The normal variation of the surface structure of the corresponding rabbit is shown in the form of a bar graph _i And O _j A schematic diagram of the distribution of normal variation among the corresponding sampling points, as can be seen from the figure, O _i The geometry of the corresponding surface is of high complexity, O _j The geometry of the corresponding surface is less complex, O when constructing an octree structure _i The corresponding spatial structure needs to be further subdivided, O _j The corresponding space does not need to be subdivided. The result of the spatial decomposition, O, shown on the left in FIG. 4 _i The corresponding space is further subdivided into a plurality of sub-intervals, and O _j The corresponding space is not further subdivided.

The 3D space can be decomposed into hierarchical local regions using an octree structure, and then, a local shape in a closed space can be encoded using a learned implicit function starting from a finest-grained octree node (i.e., a leaf node at the bottom) in the octree using an encoder, so as to obtain a feature representation corresponding to the node. According to the octree construction scheme provided by the embodiment of the application, when whether the space needs to be further subdivided or decomposed is determined, the decomposition strategy not only considers the surface occupancy (if the space corresponding to the node closes the surface), but also considers the richness of the geometric shape, the space meeting the requirements can be continuously decomposed to obtain the corresponding sub-nodes, if the space does not meet the requirements, the decomposition is not needed, and therefore when the space corresponding to the encoder node is encoded, the octree nodes with the embedded implicit kernel (implicit function) can be distributed around the surface only. Only octree nodes containing complex geometries will be further partitioned, which ensures adaptive allocation of memory and computational resources, more local implicit functions can be used to capture richer surface details, and therefore higher modeling accuracy can be achieved based on this approach. Conversely, unoccupied areas will not allocate any implicit cores to save memory and computational budget.

As shown in fig. 3, when the cubic space surrounding the airplane is decomposed, the space division diagram of the octree structure corresponding to the three-dimensional shape of the airplane is further subdivided adaptively in the region with abundant geometric details, so as to obtain higher modeling accuracy. Complex parts (such as jet engines, tail planes, landing gears and the like, in which the space corresponds to the intermediate node with the depth of 4 shown in the figure) are automatically subdivided, so that more implicit kernels (i.e. implicit functions) can be used in the encoding to achieve higher modeling accuracy, while parts with a conventional shape on the fuselage (corresponding to the leaf node with the depth of 3 shown in the figure) are encoded using a coarser representation, i.e. parts with a simpler shape or parts can no longer be subdivided.

2. Hierarchical encoder

A schematic diagram of the 2D effect of a hierarchical encoder-decoder network provided in an example of the present application is shown in fig. 4, where, as shown in fig. 4, the encoder includes hierarchical local encoders, such as hidden layer 111 of one hierarchical local encoder and hidden layer 112 of another hierarchical local encoder shown in the figure, and the local encoders of different hierarchical levels correspond to different hierarchical nodes, i.e. to different spatial sizes, and can encode local geometric features and octree structures of three-dimensional shapes into latent codes (feature representations). The decoder supports local geometry encoding, and optionally, for the encoder architecture, a 3D Voxel convolutional neural network (i.e., voxel CNN shown in fig. 4) may be employed to extract local geometry (initial geometry). As can be seen from the figure, the local encoder at the first level (the level corresponding to the hidden layer 111 in the diagram) is used for encoding the shape contained in the octree node at the bottom level, and the local encoder at the second level (the level corresponding to the hidden layer 112) is used for encoding the shape contained in the octree node at the level above the bottom level (the shape contained in the A1 area in the diagram).

After constructing the octree for the input model, 32 may be used ³ The resolution (an optional mode) of (a) is used for carrying out voxelization on the closed curved surface in the space corresponding to each octree node to obtain voxelized data. The encoding process of the hierarchical decoder starts in a bottom-up manner from the finest granularity (lowest level) octree node. For each octree node O _i Its binary identity (alpha) can be calculated from its closed geometry _i ，β _i ) The binary mark is a structural feature corresponding to the node, and is used for indicating whether the space corresponding to the node encloses a part of the surface of the three-dimensional shape and needs to be further subdivided according to the complexity of the enclosed geometric shape, wherein α is _i Corresponding to the occupancy of the surface, beta _i Corresponding to the geometric complexity of the surface, alpha if a surface is present inside the node _i And =1, otherwise 0. If O is present _i Satisfies the two subdivision requirements described hereinbefore, then β _i And =1, otherwise 0. Then, O can be introduced into the reaction mixture _i Of the closed voxelized geometry G _i (i.e. voxelized data) is passed to voxel CNN to extract O _i Geometric feature g of _i . When processing a high-level octree node, the encoder will fuse the implicit characteristics of the child octree node to its parent octree node. Optionally, for a parent octree node O _k The implicit characteristics (i.e., characteristic representation, first geometric characteristics and first structural characteristics) of its sub-octree nodes can be expressed as:

wherein, c _j ∈C _k Denotes c _j Is O _k The jth sub-octree node of (1),

denotes c _j The corresponding characteristic is represented by a corresponding one of the features,

is shown by c _j The corresponding geometric feature (first geometric feature),

and

is shown by c _j The corresponding structural feature (first structural feature),

can be

And

and

in series.

Node O _k Corresponding encoder E _k Can be mixed with O _k Fusing implicit characteristics of child nodes to O _k Geometric feature g of _k In (g) _k Can be expressed as:

each represents O _k Is represented by the characteristics of the eight child nodes,

presentation pair

Further encoding process is carried out to obtain g _k Then by stitching g _k And O _k First structural feature (alpha) _k ，β _k ) To obtain O _k Is shown. By adopting the coding mode, recursive feature coding and aggregation are executed until the root node of the octree is processed.

The specific network structure of the encoder is not limited in the embodiments of the present application, and optionally, the encoder E _i May be composed of a sequence of cascaded voxels CNN, single-Layer Perceptron (SLP), one max-pooling Layer and anotherSLP for output. At the end of the encoder, the potential spatial distribution may also be excited to fit a normal distribution using VAE re-parameterization techniques, where μ and σ as shown in fig. 4 represent the expected value and standard deviation, respectively, in the normal distribution parameter vector output by the VAE, and the output of the VAE may be constrained, when trained, by reference to a normal distribution (e.g., a standard normal distribution with an expected value of 0 and a standard deviation of 1 as shown by N (0, 1) in fig. 4).

Optionally, a local encoder E for each level in the hierarchical encoder _i Model parameters may be shared to take advantage of local geometric similarities and reduce network parameters. Of course, the SLP may be replaced with MLP (multi layer Perceptron), and the 3D CNN may be replaced with 3D rescet or another encoder for extracting set features, for example, when the input data of the model is point cloud data, pointNet or PointNet + + may be used.

An alternative hierarchical encoder E is shown in FIG. 5 _i Is a partial structural schematic view of

To

Respectively representing the characteristic representation of 8 child nodes of a parent octree node, to

By way of example, reference will now be made to the accompanying drawings, in which

And

representing a first structural feature corresponding to the first child node,

representing a first geometric feature, α, corresponding to a first sub-node _k And beta _k Respectively representing the first structural features of the parent octree node, as shown in the figure, the feature representations of eight child nodes can be input into a first MLP, maximum pooling processing is performed through the output features of the MLP to obtain a hidden vector corresponding to the parent octree node, and then the hidden vector can be encoded through a second MLP to obtain a first geometric feature g of the parent octree node _k (i.e., e in the figure) _k ) The first geometric feature g of the parent octree node is set _k And a first structural feature a thereof _k And beta _k And splicing is the feature representation of the parent octree node.

3. Hierarchical decoder

Feature decoder D, whose purpose is to decode octree structure and local octree nodes from input global features, is composed of hierarchical local decoders { D _i A (including the hidden layer structure shown in fig. 4 (an ellipse containing a plurality of solid and empty dots, corresponding to the first decoding module described in the foregoing), and in a 3D modeling application, may further include a hidden Decoder (i.e., an ImpOct Decoder shown in fig. 4, corresponding to the second decoding module described in the foregoing) having a mirror structure with respect to the hierarchical encoder E, as shown in fig. 4, and a local Decoder D _i Representing a decoder corresponding to an ith hierarchy node. In contrast to encoder E, the decoding process starts from the root node and recursively decodes the hidden vectors of the octree children in a top-down fashion. In particular, for having geometrical features g _k Parent octree node O of _k Using a decoder D _k Decoding the geometric characteristics of its sub-octree nodes, and the decoding principle can be expressed as:

wherein D is _k (g _k ) Is shown for g _k Decoding to obtain O _k Is characterized by eight child nodes

c _j Represents O _k J-th child node of (1), through g _k Obtained by decoding

Represents c _j And two indicators, i.e. structural features (second structural features), which determine whether the child node needs to be decoded or subdivided. For a parent node, its 8 child nodes can be decoded synchronously.

The embodiment of the present application is not limited to a specific network structure of the decoder. Alternatively, the decoder (first decoding module) may comprise two SLPs (or MLPs) and two classifiers, for one node O _k The first geometric feature g of the node can be measured by one SLP _k Decoding to obtain the hidden features corresponding to the 8 sub-nodes of the node so as to

Represents O _k In order to determine the target node among the 8 child nodes and decode the target node to obtain the second geometric feature of each target node, whether the child node is the target node can be judged by decoding the hidden feature of each child node, and in order to decode the hidden feature of the child node, two classifiers I can be utilized _g And I _h To predict the surface occupancy and the necessity of further subdivision of the sub-nodes, in particular for O _k J (which may be represented as

) Can be distinguished by

Respectively input to a classifier I _g And I _h Through I _g Decoding

To obtain

The decoding principle can be expressed as

Through I _h Decoding

To obtain

The decoding principle can be expressed as

And

respectively representing a first characteristic value and a second characteristic value in the second structural characteristic, if

(an optional threshold) then indicating

Surfaces that do not contain three-dimensional shapes, i.e. do not contain any geometry of three-dimensional shapes, then

No further processing is required and,

then it indicates

Corresponding space includesSurface is then based on

Making a further determination, in particular, if

(an optional threshold) then indicating

The geometry of the corresponding surfaces is relatively simple and can no longer be aligned

The further subdivision is carried out and the,

is the second geometric characteristic of

The follow-up can be based on the feature pair

Comprising performing a prediction; if it is not

Then need to be paired

Further subdivision is carried out, that is to say

As a parent node, by performing the same decoding process as described above, the result is obtained

Is generated for each child node. And for each node in the octree, starting from the root node, adopting a top-down mode, and repeating the decoding process until no more nodes needing further decoding exist.

FIG. 6 shows an alternative hierarchical decoder D _i Part of the structure of (1), g shown in the figure _k (i.e. e) _k ) Representing a first geometric feature of a parent octree node, g can be _k Inputting the hidden features into the first MLP, obtaining the hidden vectors corresponding to each child node of the parent octree node through the MLP decoding, and then, taking the first child node as an example, respectively inputting the hidden vectors of the child nodes into two classifiers to obtain the second structural features of the child nodes, i.e., the hidden features shown in fig. 6

And

the hidden vector of the child node can be input into a second MLP, and the second geometric feature corresponding to the child node is obtained by decoding the MLP

And by splicing

And

and

obtaining a decoded feature representation corresponding to the child node

In the same way, the characteristic representations of other 7 child nodes, namely those shown in the figure, can be obtained

To

Furthermore, if according to

And

if the child node is determined to be the target node, i.e., the node that needs further decoding processing, the decoding process may be repeated for the child node.

For the second geometry of each node decoded by the hierarchical decoder, the 3D surface located in the node correspondence space can be reconstructed by a local implicit decoder G (i.e. a second decoding module). For one node O _i The second geometric characteristic g of the node can be determined _i And a 3D query position x (i.e. the three-dimensional coordinate (may be a world coordinate) of the query point, which may be any point in space) is input into the implicit decoder G to predict a signed distance (such as an SDF distance, which may also be referred to as an SDF value) corresponding to the query point, where the signed distance characterizes the query point and the position of the query point at O _i The positional relationship of the surfaces in (1). Then, based on the position relationship between a large number of query points and the surface located in the node, the corresponding 3D surface shape can be reconstructed.

As shown in fig. 4, e is shown _k1 And e _k2 Respectively representing the geometric characteristics of two nodes obtained by the general decoding. In the figure, [ x, y, z ]]Three-dimensional coordinates representing the query point in space, for [ x, y, z ] above in the figure]And [ x, y, z ] below]The query positions corresponding to the two query points are respectively points pointed by dotted arrows in the figure, the upper query point is a point positioned in a small square interval with filling colors above the graph on the right side of the graph in figure 4, the lower query point is a point positioned in a square area with filling colors below the graph in the graph on the right side of the graph in figure 4, and as can be seen from the graph, the lower query point and the point O in the graph are compared _i Second geometrical feature g of the node _0i After being input into the implicit decoder, the position relation determined by decoding is that the query point is O _i The position corresponding to the SDF value obtained by decoding corresponding to a point outside the corresponding space is the position shown by "+0" in the figure, and the upper inquiry point is obtained by connecting the three dimensions of the inquiry pointAnd the position corresponding to the SDF value obtained by decoding through the implicit decoder is the position shown by the plus 1 in the graph.

For each octree node, the closed surface in the corresponding space is obtained by decoding from the implicit features (namely the second set features corresponding to each node output by the hierarchical decoder) by the implicit decoder. However, the finest granularity octree nodes may have different sizes, and when querying the value of the local implicit function (SDF value), the world coordinate x of the query point may be related to the center x of the node _i And carrying out normalization processing. The SDF value of a query point decoded by the implicit decoder can be expressed as:

wherein, theta _d Representing implicit decoders learned through training

Network parameter of g _i Represents node O _i Geometric features decoded by a hierarchical decoder, x representing world coordinates of any query point, x _i Represents O _i Coordinate of center point of (1), N (x-x) _i ) Denotes x relative to x _i Normalizing to coordinates within a specified range of values, e.g., -1,1]Within the range of (A) and (B),

then is made by

Based on g _i And N (x-x) _i ) Predicting to obtain x corresponding to O _i The positional relationship of the included surfaces, such as the SDF value.

In practical applications, in order to avoid discontinuity of the region boundary between the nodes, the spatial region corresponding to each node may be increased, so that there is partial overlap between adjacent spaces corresponding to nodes of the same hierarchy. Alternatively, each node may have a 50% spatial overlap between its connected nodes in the axial direction. When querying the SDF value, which is an implicit value of the overlapping area, a trilinear interpolation operation may be performed on all nodes where there is a cross at the query point, that is, a value obtained by performing an interpolation operation on the SDF value corresponding to each node based on the query point is used as the SDF value of the query point.

In order to train the neural network model (including the hierarchical encoder, the decoder and the implicit decoder) meeting the application requirements, as an alternative, the loss function corresponding to the neural network model may include the first loss function, the second loss function and the third loss function described above, where the first loss function may also be referred to as a geometric loss, and represents a difference between the predicted position relationship and the real position relationship corresponding to the query point, specifically, a difference between the predicted SDF value and the real SDF value corresponding to the query point, the second loss function includes a structural loss and a subdivision loss, and is used to determine whether the octree node is occupied by a surface and whether a subdivision loss is required, that is, a difference between a real structural feature of the octree node (i.e., a first structural feature corresponding to the node) and a second structural feature predicted by the hierarchical decoder, and the third loss function corresponds to a VAE structure in the encoder structure, and is used to constrain a suitable distribution of a corresponding feature space. The specific functional form of the loss function of each part is not limited in the embodiments of the present application. Alternatively, the first loss function may be an L2 loss or a BCE (binary cross entropy) loss, the second loss function may also be a BCE loss, and the third loss function may be a KL Divergence (Kullback-Leibler Divergence) loss.

The loss functions of the above parts are separately described below with reference to alternative schemes.

For an octree node O _i Geometric loss L corresponding to the node _geo Can be expressed as follows:

where P represents a set of sample points (i.e., a set of query points), j represents any query point, and L _c Representing the geometric penalty, x, corresponding to query point j _j Three-dimensional position information, F (x), representing a query point j _j ) The label represents the real position relation corresponding to the query point j, namely the query point j is O _i Containing true labels for points inside (in front of), outside (behind) or on the surface, G (G) _i ,x _j ) Representation is according to G by an implicit decoder G _i And x _j Predicted positional relationship, L, of predicted query point j _c It is the difference between the true positional relationship and the predicted positional relationship corresponding to the query point j, such as the difference between the true SDF value and the predicted SDF value. w is a _j It can be understood that the weight corresponding to the query point j can be x _j The reciprocal of the density of nearby sample points is used to sample the sample density variation. Optionally, the implicit decoder G may be pre-trained based on the local three-dimensional shape, and then the pre-trained implicit decoder G and other parts of the neural network model are trained again.

In order to improve the performance of the trained neural network model, the model can learn more information through the second loss function and the third loss function, and strong supervised training of the model is realized. Optionally, different loss functions correspond to training losses of models with different dimensions, and importance degrees of information with different dimensions in actual application requirements are also different, for example, in a 3D reconstruction application scenario, accuracy of a 3D shape of an object obtained through reconstruction is particularly important, and thus, a larger weight can be assigned to geometric loss. Optionally, the total loss function L of the model _total Can be expressed as follows:

wherein L is _h And L _k Represents a second loss function, which may be a classification loss, L, corresponding to the two classifiers in the foregoing text _KL Representing a third loss function, in particular a training loss corresponding to the VAE,

expected values of losses corresponding to all nodes (which may be all nodes containing surfaces), λ and β representing L, respectively _geo And L _KL The weights of (a) may be configured according to empirical and/or experimental values, such as λ =10, β =0.01. During training, whether the model is trained or not can be determined based on the loss value by calculating the loss value of the total loss function corresponding to all samples, if the loss value does not meet the training end condition, the network parameters of the model can be adjusted, the training is repeated until the model meeting the training end condition is obtained, and the trained hierarchical decoder can be used for extracting the three-dimensional geometric features (namely the second geometric features) of the three-dimensional object, so that the reconstruction, the recognition or the classification of the three-dimensional object can be realized based on the extracted second geometric features of all nodes.

The scheme provided by the embodiment of the application is a learnable hierarchical implicit characterization for a 3D curved surface (namely, a 3D surface structure), and allows a complex surface to be coded with high precision by using lower memory and computational budget. Based on the scheme, the problem of low efficiency of 3D modeling and reconstruction based on the local implicit function in the prior art is solved. The scheme adaptively subdivides a 3D space according to the surface occupancy and the richness of the geometric shape by introducing a hierarchical octree structure. Since octree is discrete and non-differentiable, the present application proposes a new hierarchical network (i.e. a hierarchical coder-decoder) that can recursively encode and decode octree structures and surface geometries in a differentiable manner, combining the advantages of local implicit representation and hierarchical data structures, by associating a local implicit function with each octree node unit, more implicit functions will be assigned to model surfaces with complex geometric details during training and testing, thereby improving reconstruction accuracy. The representation mode provided by the embodiment of the application can be used for carrying out three-dimensional modeling on a large-scale scene with fine geometric details by using a small amount of memory overhead, so that the calculation cost can be greatly reduced, and the inference speed in actual application can be increased. The solution provided by the embodiments of the present application can be applied to various tasks, including but not limited to 3D shape reconstruction, shape completion with partial and noisy inputs, image-based 3D reconstruction, and the like.

In a 3D reconstruction application scenario, the inventor of the present application has performed experiments on the scheme provided by the present application and an existing 3D reconstruction scheme, and experiments prove that, compared with the existing scheme, the scheme provided by the embodiment of the present application has better effects in the aspects of 3D reconstruction efficiency and accuracy, and the memory and the calculation cost of the used computing device are also less.

Based on the same principle as the method provided by the embodiment of the present application, the embodiment of the present application further provides a data processing apparatus, as shown in fig. 7, the data processing apparatus 100 includes a data obtaining module 110 and a data processing module 120. Wherein:

a data obtaining module 110, configured to obtain source data of a three-dimensional object;

a data processing module 120 configured to perform the following operations:

acquiring first geometric characteristics of a plurality of nodes corresponding to source data based on the source data, wherein the plurality of points comprise a plurality of levels of nodes with tree structures, the nodes of different levels correspond to different spaces in a three-dimensional space of a three-dimensional object, for any father node in the plurality of nodes, the first geometric characteristics of the father node are obtained based on the first structural characteristics and the first geometric characteristics of a plurality of child nodes of the node, and the first structural characteristics of one node represent the occupation condition of the surface structure of the three-dimensional object contained in the space corresponding to the node relative to the whole surface of the three-dimensional object;

determining a target node in the plurality of nodes based on the acquired first geometric characteristics of the plurality of nodes, and decoding the first geometric characteristics of the target node to obtain second geometric characteristics corresponding to each sub-node under the target node; and processing the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding to obtain corresponding processing results.

Optionally, the source data includes at least one of:

point cloud data; a two-dimensional image; and (4) grid data.

Optionally, the data processing module 120 is configured to perform the following any one when performing processing on the three-dimensional object based on the second geometric feature corresponding to each node obtained by decoding to obtain a corresponding processing result:

Optionally, the data processing module 120 is configured to, when constructing the three-dimensional image of the three-dimensional object based on the second geometric features corresponding to the nodes obtained through decoding:

determining the position relation of the surface structure contained in the space corresponding to each node to be queried and each node obtained by decoding based on the second geometric characteristics corresponding to each node obtained by decoding and the three-dimensional position information of each node to be queried;

and constructing a three-dimensional image of the three-dimensional object based on the determined position relation corresponding to each point to be inquired.

Optionally, the first structural feature of a node further characterizes the geometric complexity of the surface structure of the three-dimensional object contained in the space corresponding to the node.

Optionally, when determining a target node in the multiple nodes based on the obtained first geometric features of the multiple nodes and obtaining second geometric features corresponding to the sub-nodes under the target node by decoding the first geometric features of the target node, the data processing module 120 uses a root node in the multiple nodes as an initial node to be processed according to a sequence from high to low of a hierarchy corresponding to the multiple nodes, and is implemented by repeatedly performing the following operations:

decoding to obtain a second geometric characteristic corresponding to each sub-node based on the implicit characteristic of each sub-node;

respectively taking each target child node as a new node to be processed;

Optionally, for a node, the second structural feature includes a first feature value and a second feature value, the first feature value represents an occupation situation corresponding to the node, where the occupation situation is whether a space corresponding to the node includes a surface structure of a three-dimensional object, and the second feature value represents a geometric complexity of the surface structure of the three-dimensional object included in the space corresponding to the node;

correspondingly, when determining the target child node in each child node based on the second structural feature of each child node, the data processing module 120 is configured to:

for a child node, if the surface structure including the three-dimensional object in the space corresponding to the node is determined based on the first characteristic value of the child node, and the space corresponding to the geometric shape complexity characteristic of the surface structure needs to be subdivided (that is, a division condition is satisfied, if the value of the second characteristic value is a specified value or a value greater than or equal to a set value) based on the second characteristic value corresponding to the child node, the child node is determined as a target child node.

Optionally, the determining a target node in the plurality of nodes based on the obtained first geometric features of the plurality of nodes, and obtaining second geometric features corresponding to each sub-node under the target node by decoding the first geometric features of the target node is implemented by a decoding model, where the decoding model is obtained by a training device through the following training:

acquiring a training data set, wherein each sample of the training data set comprises a three-dimensional shape and a corresponding training label, and the training labels correspond to the processing mode of a three-dimensional object;

training a neural network model based on node data and training labels corresponding to each sample, wherein the neural network model comprises a cascade coding module and a decoding module, the decoding module comprises a first decoding module, and the trained decoding module is used as the decoding model;

Optionally, the decoding module further includes a second decoding module cascaded with the first decoding module, and during training, the input of the second decoding module includes the output of the first decoding module and three-dimensional position information of the plurality of sample query points, and the output is a position relationship of the plurality of sample query points and a surface structure included in a space corresponding to each node obtained by decoding;

the data processing module 120 is configured to, when performing processing on the three-dimensional object based on the second geometric features corresponding to the nodes obtained by decoding to obtain corresponding processing results:

and constructing a three-dimensional image of the three-dimensional object based on the determined position relationship corresponding to each point to be queried.

Optionally, the training device is configured to, when constructing the octree structure corresponding to each three-dimensional shape:

and taking the determined sub-nodes to be divided as new nodes to be divided.

Optionally, for each child node, when determining the first structural feature corresponding to the child node, the training device is configured to:

if the space corresponding to the sub-node contains the three-dimensional surface structure, determining a first characteristic value of a first characteristic of the sub-node as a first value, otherwise, determining the first characteristic value as a second value;

if the space corresponding to the sub-node comprises the three-dimensional surface structure, determining the geometric shape complexity of the surface structure, and determining a second characteristic value of a second characteristic corresponding to the node according to the geometric shape complexity, wherein the second characteristic value is a third value or a fourth value;

the first structural feature corresponding to the node comprises a first feature value and a second feature value.

Optionally, the training device is configured to, when determining the child node to be divided in each child node based on the first structural feature corresponding to each child node:

and determining the nodes with the first characteristic value as the first value and the second characteristic value as the third value in the first structural characteristics corresponding to each sub-node as the sub-nodes to be divided.

Optionally, for any sample, the encoding module obtains the first geometric characteristic corresponding to each sample node of the sample by encoding in the following manner:

and coding each sample node in the octree structure in sequence according to the order of the octree structure corresponding to the sample from bottom to top to obtain a first geometric characteristic corresponding to each sample node, wherein for the sample node without child nodes in the octree structure, the first geometric characteristic of the sample node is obtained by coding the voxelized data corresponding to the sample node by a coding module, and for the sample node with child nodes, the first geometric characteristic of the sample node is obtained by coding the first geometric characteristic and the first structural characteristic of each child sample node of the sample node by the coding module.

Optionally, for a sample, the training label includes a first label and a second label, the first label represents a real position relationship corresponding to each of the plurality of sample query points, the second label represents a real structural feature corresponding to each sample node in the octree structure corresponding to the sample, and the output of the first decoding module further includes a second structural feature corresponding to each sample node;

the total loss function of the neural network model comprises a first loss function and a second loss function, wherein for a sample, the value of the first loss function represents the difference between the real position relation and the predicted position relation corresponding to each sample query point in the octree structure corresponding to the sample, and the value of the second loss function represents the difference between the real structure characteristic and the second structure characteristic corresponding to each sample node corresponding to the sample.

Optionally, the coding module includes a cascaded feature extraction module and a VAE, the geometric features corresponding to the nodes output by the coding module are determined based on the normal distribution parameters of the VAE, the total loss function further includes a third loss function, and for a sample, a value of the third loss function represents a difference between the normal distribution of the sample corresponding to the VAE and a reference normal distribution.

Based on the same principle as the method provided by the embodiment of the present application, the embodiment of the present application provides an electronic device, which includes a memory and a processor; the memory has stored therein a computer program which, when executed by the processor, may carry out the method as provided in any of the alternatives of the present application.

As an alternative, fig. 8 shows a schematic structural diagram of an electronic device to which the embodiment of the present application is applied, and as shown in fig. 8, an electronic device 4000 shown in fig. 8 includes a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 4003 is used for storing application program codes (computer programs) for executing the present scheme, and execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

The electronic device includes, but is not limited to, a user terminal device, a server, where the server may be a physical server, a cloud server, a single server or a server cluster, and the like. Optionally, based on the method provided in this embodiment of the present application, an application based on the method may be designed, and a user may install a client of the application, and communicate with a server of the application through the client to implement 3D modeling, for example, the user may send a two-dimensional image of an object to the server at the client, and the server may implement output of a three-dimensional image of the object by using the scheme provided in this embodiment of the present application based on the two-dimensional image.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the program runs on a computer, the computer can be enabled to execute the corresponding contents in the foregoing method embodiments.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and embellishments can be made without departing from the principle of the present invention, and these should also be construed as the scope of the present invention.

Claims

1. A data processing method, comprising:

acquiring source data corresponding to the three-dimensional object;

acquiring first geometric characteristics of a plurality of nodes corresponding to the source data based on the source data, wherein the plurality of nodes comprise a plurality of levels of nodes with tree structures, the nodes of different levels correspond to different spaces in a three-dimensional space of the three-dimensional object, for any parent node in the plurality of nodes, the first geometric characteristics of the parent node are obtained based on the first structural characteristics and the first geometric characteristics of each child node of the node, and the first structural characteristics of one node represent the occupation condition of the surface structure of the three-dimensional object contained in the space corresponding to the node relative to the whole surface of the three-dimensional object; the first structural feature of a node also characterizes the geometric complexity of the surface structure of the three-dimensional object contained in the space corresponding to the node;

processing the three-dimensional object based on the second geometric characteristics corresponding to the nodes obtained by decoding to obtain corresponding processing results;

wherein the target nodes comprise a root node in the plurality of nodes and a target child node satisfying a set condition in the plurality of nodes; the condition that one sub-node meets the set condition means that the space corresponding to the node contains a surface structure, or the space contains the surface structure and the complexity of the geometric shape of the surface structure is greater than or equal to a complexity threshold, or the complexity is a specified value;

whether a child node meets a set condition is determined based on a second structural feature corresponding to the node, wherein the second structural feature corresponding to the node is obtained by decoding a first geometric feature of a parent node of the node.

2. The method of claim 1, wherein the source data comprises at least one of:

point cloud data; a two-dimensional image; grid data;

the processing of the three-dimensional object is performed based on the second geometric features corresponding to the nodes obtained by decoding, so as to obtain a corresponding processing result, wherein the processing result includes any one of the following items:

3. The method according to claim 2, wherein constructing the three-dimensional image of the three-dimensional object based on the decoded second geometric features corresponding to the nodes comprises:

and constructing a three-dimensional image of the three-dimensional object based on the determined position relation corresponding to each point to be queried.

4. The method according to claim 1, wherein the determining a target node in the plurality of nodes based on the obtained first geometric features of the plurality of nodes, and obtaining second geometric features corresponding to the child nodes under the target node by decoding the first geometric features of the target node, is implemented by taking a root node in the plurality of nodes as an initial node to be processed in an order from high to low of the hierarchy corresponding to the plurality of nodes, and repeatedly performing the following operations:

taking each target child node as a new node to be processed;

and determining a target node in the plurality of nodes, wherein the determined target node comprises the root node and each target child node.

5. The method according to claim 4, wherein the second structural feature comprises, for a node, a first feature value and a second feature value, the first feature value characterizes an occupation situation corresponding to the node, the occupation situation is whether the space corresponding to the node contains the surface structure of the three-dimensional object, and the second feature value characterizes the geometric complexity of the surface structure of the three-dimensional object contained in the space corresponding to the node;

the determining a target child node in each child node based on the second structural feature of each child node includes:

and for one child node, if the surface structure containing the three-dimensional object in the space corresponding to the node is determined based on the first characteristic value of the child node, and the space corresponding to the geometric shape complexity characteristics of the surface structure is determined to be required to be subdivided based on the second characteristic value corresponding to the child node, determining the child node as a target child node.

6. The method according to any one of claims 1 to 5, wherein the determining a target node among the plurality of nodes based on the obtained first geometric features of the plurality of nodes, and obtaining the second geometric features corresponding to the sub-nodes under the target node by decoding the first geometric features of the target node, is implemented by a decoding model, and the decoding model is trained by:

acquiring a training data set, wherein each sample of the training data set comprises a three-dimensional shape and a corresponding training label, and the training label corresponds to the processing mode of the three-dimensional object;

constructing an octree structure corresponding to each three-dimensional shape, and acquiring node data corresponding to each sample node of the octree structure, wherein for each sample node, the node data comprises voxelized data of a space corresponding to the sample node and a first structural characteristic corresponding to the sample node;

training a neural network model based on node data and training labels corresponding to the samples, wherein the neural network model comprises a cascade coding module and a decoding module, the decoding module comprises a first decoding module, and the trained decoding module is used as the decoding model;

for one sample, the input of the encoding module comprises node data corresponding to the sample, the output of the encoding module is a first geometric feature corresponding to each sample node, the input of the first decoding module comprises the first geometric feature of each sample node in an octree structure, and the output of the first decoding module comprises a second geometric feature corresponding to each sample node obtained through decoding.

7. The method according to claim 6, wherein the decoding module further comprises a second decoding module cascaded with the first decoding module, and during training, the input of the second decoding module comprises the output of the first decoding module and three-dimensional position information of a plurality of sample query points, and the output is a position relationship of a plurality of sample query points and a surface structure contained in a space corresponding to each node obtained by decoding;

the processing of the three-dimensional object is performed based on the second geometric features corresponding to the nodes obtained by decoding, so as to obtain corresponding processing results, and the processing results include:

for each node obtained by decoding, determining the position relation of the surface structure contained in the space corresponding to each point to be queried and the node through the trained second decoding module based on the second geometric characteristics corresponding to the node and the three-dimensional position information of each point to be queried;

8. The method of claim 7, wherein said constructing an octree structure corresponding to each of said three-dimensional shapes comprises:

taking the root node as an initial node to be divided, repeatedly executing the following operations on the node to be divided until the node to be divided does not exist, and obtaining an octree structure corresponding to the three-dimensional shape based on the obtained nodes:

dividing the space corresponding to the node to be divided into eight subspaces, wherein each node corresponding to the eight subspaces is eight child nodes of the node;

determining sub-nodes to be divided in each sub-node based on first structural features corresponding to the sub-nodes;

and taking the determined sub-nodes to be divided as new nodes to be divided.

9. The method of claim 8, wherein said determining, for each of said child nodes, a first structural characteristic corresponding to said child node comprises:

if the space corresponding to the child node contains the three-dimensional surface structure, determining a first characteristic value of a first characteristic of the child node as a first value, otherwise, determining the first characteristic value as a second value;

if the space corresponding to the sub-node contains the surface structure of the three-dimensional shape, determining the geometric shape complexity of the surface structure, and determining a second characteristic value of a second characteristic corresponding to the node according to the geometric shape complexity, wherein the second characteristic value is a third value or a fourth value;

if the space corresponding to the sub-node does not contain the surface structure of the three-dimensional shape, determining a second characteristic value of a second characteristic corresponding to the sub-node as a fourth value;

and the first structural characteristic corresponding to the node comprises the first characteristic value and the second characteristic value.

10. The method according to claim 9, wherein the determining the sub-nodes to be divided in each of the sub-nodes based on the first structural feature corresponding to each of the sub-nodes comprises:

11. The method of claim 6, wherein for any of the samples, the encoding module obtains the first geometric feature corresponding to each sample node of the sample by encoding:

and sequentially coding each sample node in the octree structure according to the order of the octree structure corresponding to the sample from bottom to top to obtain a first geometric characteristic corresponding to each sample node, wherein for the sample node without child nodes in the octree structure, the first geometric characteristic of the sample node is obtained by coding the voxelized data corresponding to the sample node through a coding module, and for the sample node with child nodes, the first geometric characteristic of the sample node is obtained by coding the first geometric characteristic and the first structural characteristic of each child sample node of the sample node through the coding module.

12. A data processing apparatus, characterized by comprising:

the data acquisition module is used for acquiring source data of the three-dimensional object;

a data processing module for performing the following operations:

whether a child node meets the set condition is determined based on a second structural feature corresponding to the node, wherein the second structural feature corresponding to the node is obtained by decoding the first geometric feature of the parent node of the node.

13. An electronic device, comprising a memory having a computer program stored therein and a processor that, when running the computer program, performs the method of any of claims 1 to 11.

14. A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when executed by a processor, causes the processor to carry out the method of any one of claims 1-11.