CN115511050B

CN115511050B - Deep learning model with simplified three-dimensional model grid and training method thereof

Info

Publication number: CN115511050B
Application number: CN202211170843.7A
Authority: CN
Inventors: 杜创; 杨会兵; 魏亚兴; 潘竹; 何堃
Original assignee: Hefei Comprehensive Pipe Gallery Investment Operation Co ltd
Current assignee: Hefei Comprehensive Pipe Gallery Investment Operation Co ltd; Hefei Ruisheng Smart Technology Co ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-07-21
Anticipated expiration: 2042-09-23
Also published as: CN115511050A

Abstract

The invention belongs to the field of computer graphics technology, deep machine learning and industrial automation virtual simulation, and provides a three-dimensional model grid simplified deep learning model which comprises a cyclic neural network layer, a full connection layer, a 3D convolution layer, an encoder layer and a decoder layer, and an attention layer. A training method for three-dimensional model mesh simplification is also provided. The similarity function of the grid structure in the invention can keep the similarity of the rendering images under the condition of greatly reducing the total number of triangular surfaces of the model, the simplification effect of the similarity function can be improved along with the increase of training samples and the increase of training rounds, and the similarity function has good simplification performance and execution efficiency and higher robustness.

Description

Deep learning model with simplified three-dimensional model grid and training method thereof

Technical Field

The invention belongs to the field of computer graphics technology, deep machine learning and industrial automation virtual simulation, and particularly relates to a deep learning model with a simplified three-dimensional model grid and a training method thereof.

Background

With the need of social development and the progress of scientific technology, three-dimensional data is a general and important data type, all aspects of social production and life comprehensively enter from the fields of original engineering application, high-performance games and the like, meanwhile, due to the high-speed development of three-dimensional data acquisition and modeling technology, the fineness of a three-dimensional model is also rapidly improved, and the three-dimensional data can be equivalently regarded as image data with expanded dimensions, so that the increase of storage capacity and calculation amount caused by the improvement of the precision is more severe.

The three-dimensional model is represented by the following steps: voxel (pixels), point Clouds (Point Clouds) and triangular meshes (Meshs), wherein the most commonly used triangular meshes are those which are used for comparing Voxel and Point cloud data and have the maximum information entropy, the data format in vector form is not limited by resolution, and various Graphic Processors (GPUs) in the current mainstream are also optimized for the triangular meshes.

In order to reduce the resource usage of each link of three-dimensional data storage, transmission, loading, rendering and the like, simplifying and multi-resolution modeling on a three-dimensional model is naturally an improvement approach selection.

There are many methods in the aspect of three-dimensional model simplification at present, such as a traditional clipping algorithm based on vertex deletion and vertex clustering, an iterative algorithm based on edge contraction, a filtering algorithm based on fourier transform or wavelet transform, and the like, and in the background that current deep machine learning is mature continuously, a method of using deep machine learning has also appeared, such as a university of bloom chart (Jittor) team proposes a convolutional neural network SubdivNet aiming at a triangular mesh patch, and an image network architecture is migrated to three-dimensional geometry learning to simplify a three-dimensional model.

However, these methods have a series of problems, such as grid surface fracture, serious contour feature loss, and large amount of details remained in the closed invisible area, while for SubdivNet, which is a method based on triangle surface reforming or three-dimensional convolution after voxel formation, when the sizes of different geometric elements in the processing model are greatly different, the method can lead to the process of triangle surface growth in the process, and the phenomenon of serious cutting of sharp parts of the model.

Meanwhile, when the methods simplify the processing, the factors in the aspect of space geometry are more considered, but insufficient importance is attached to the images rendered by the three-dimensional model, so that some models have high similarity in the aspect of space geometry, but the differences in final rendering are larger, and most remarkable, when the LoD technology is applied to perform real-time rendering, the three-dimensional model is switched among different detail levels to cause obvious perceived change.

There is a need for a simplified method of three-dimensional modeling that overcomes the shortcomings of the prior art methods described above.

Disclosure of Invention

The invention aims to construct a technology for grid simplification of a three-dimensional model by applying a deep learning technology, and corresponding target functions are obtained by designing a deep neural network model and training and adjusting. The model can simplify grids according to the input source three-dimensional model M and vertex retention coefficients, and further effectively reduces consumption of three-dimensional data in a series of processes of storage, transmission, loading, rendering and the like. The invention relates to a three-dimensional model grid simplified deep learning model and a training method thereof, wherein the three-dimensional model grid simplified deep learning model comprises a cyclic neural network layer, a full connection layer, a 3D convolution layer, an encoder layer and a decoder layer; the circulating neural network layer is used as an input layer and is used for receiving the input of the three-dimensional model network surface data with indefinite length; the two or more fully connected layers are arranged behind the cyclic neural network layer and are used for extracting the characteristics and connectivity of the three-dimensional model network surface data input in the previous layer; deforming the data after the full connection layer, and performing affine transformation of a three-dimensional space on the data input by the previous full connection layer; the three-dimensional convolution layer is sequentially used for downsampling the input, and extracting the external characteristics of the input; the 3D convolution layer is followed by an encoder layer and a decoder layer and is used for carrying out implicit encoding and decoding on the input three-dimensional model; the encoder layer and the decoder layer are followed by two or more 3D convolution layers, the 3D convolution layers sequentially supersamples the input, and constructs and refines outline features; the 3D convolution layer is followed by two or more full connection layers for carrying out translation precoding on the simplified model; the attention layer is arranged behind the full connection layer and is used for intersecting the 3D convolution layers before and after encoding and decoding, and controlling the vertex output of the three-dimensional model according to training weights so as to simplify the model; the attention layer is followed by a cyclic neural network layer, the cyclic neural network layer is used as an output layer, and the simplified model is output in the form of a floating point array of L. 3*3.

In a second aspect, a training method for three-dimensional model mesh simplification is provided, which includes an initial training stage, wherein a three-dimensional jacarella similarity coefficient is used as a loss function to accelerate convergence of a model; after the model converges, switching to a grid structure similarity function as a loss function to train; in training using the rasterized structure similarity function, the model is further optimized by randomizing the illumination and angle of the rasterized environment.

Further, the code of the rasterized structure similarity function is defssim_rasterize (y_true, y_pred):

return ssim(rasterize(y_true),rasterize(y_pred))。

further, the rasterization structure similarity function is to respectively rasterize a true value y and a predicted value y' of the multidimensional array representation of the three-dimensional model, and then the structural similarity of the two.

Further, the step of calculating the rasterized structure similarity function is to randomly generate each element of the rasterized shader: the shader type, camera type and position, light position, and intensity; rasterizing the input three-dimensional model Y and the output three-dimensional model Y' under the shader respectively; substituting the rasterized rendering produced plane images I and I' into a structural similarity function to obtain a loss value of the plane images; the same training step can grid the three-dimensional model Y and the output three-dimensional model Y' twice or more, reduce and average the loss value and accelerate the training process.

Further, before the initial training phase begins, a data sample file needs to be preprocessed, the data sample file is converted into a data form of an L x 3*3 floating point number array, a single model is normalized, and the single model is placed in a three-dimensional coordinate origin point at the center, and the single model is scaled to a data space with variance of 1 and 0.

The beneficial effects are that:

the method is similar to an NLP (natural language processing) model, extracts abstracts from long texts, and meanwhile, the similarity function of the grid structure in the method is directly from the purpose of simplifying from a three-dimensional model, so that the similarity of rendered images of the model can be kept under the condition that the total number of triangular faces of the model is greatly reduced, and the simplification effect of the model can be improved along with the increase of training samples and the increase of training rounds. Compared with the prior method, the method has good simplified performance and execution efficiency, and has higher robustness.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a logic diagram of a three-dimensional model grid simplified deep learning model;

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

Because of the principle of human eyes perception of three-dimensional images and the limitation of the existing presentation equipment, the presentation form of the current three-dimensional data is still mainly based on plane images, so that it is a feasible target to reduce the triangular surface number of the three-dimensional model as much as possible on the basis of keeping the details of the three-dimensional model rendering generation two-dimensional images, in other words, if the three-dimensional model X and the simplified model X ' are closer to each other when plane images I and I ' are produced by rasterization rendering, the better the simplification effect can be considered, and meanwhile, the lower the ratio of the total number N ' of the triangular surfaces of the simplified model to the total number N of the triangular surfaces of the original three-dimensional model can be considered to be more effective. The invention uses a three-dimensional model simplification method driven by an image, and simultaneously uses a deep learning mode to carry out fitting of a simplification function f to solve the problem that a plurality of related factors cannot be hard-coded, and the technical scheme of the invention mainly comprises the following steps:

three-dimensional model mesh reduced deep learning model logic as shown in fig. 1:

1. the input layer is a circulating neural network layer which receives the input of the three-dimensional model network surface data with indefinite length;

2. the three-dimensional model network surface data input by the previous layer is extracted by two or more full-connection layers;

3. then deforming the data, and carrying out affine transformation of a three-dimensional space on the data input by the previous full-connection layer;

4. the three-dimensional convolution layer is sequentially used for downsampling the input, and extracting the external characteristics of the input;

5. followed by an encoder layer and a decoder layer for implicit encoding and decoding of the input three-dimensional model;

6. the three-dimensional (3D) convolution layers are arranged behind the three-dimensional (3D) convolution layers, the 3D convolution layers sequentially supersamples input, and builds and refines outline features;

7. two or more full connection layers are arranged behind the model, and the full connection layers are used for carrying out translation precoding on the simplified model;

8. and then, the attention layer adopts a cross attention mechanism, takes a full connection layer and a bottleneck full connection layer which are behind the input layer as input, obtains cross weights of hidden information between two layers, and filters the weights at the output by a learned threshold value, thereby realizing the extraction of key information and realizing the purpose of simplifying a three-dimensional model.

9. The output layer is a cyclic neural network layer, and the simplified model is output in the form of a floating point array of L3*3.

The three-dimensional grid computing depth neural network model is trained after being constructed, and the training method is provided for the situation that the depth neural network model is difficult to converge:

1. because the input X and the expected output Y of the three-dimensional grid computing depth neural network model are both original three-dimensional models, the method does not need to label data;

2. the data sample of the three-dimensional grid computing depth neural network model can be selected from an open data set, an own data set or an automatically generated three-dimensional model, as long as sample data has universality;

3. before training, the data sample file is preprocessed, converted into a data form of an L-3*3 floating point number array, normalized, and placed in a three-dimensional coordinate origin point to be scaled to a data space with variance of 1 and 0.

4. In the initial training stage of the three-dimensional grid computing depth neural network model, 3DIoU (three-dimensional Jacar similarity coefficient) is used as a loss function to accelerate the convergence of the model;

5. after the three-dimensional grid computing depth neural network model converges, switching to a grid structure similarity function as a loss function for training;

6. in training using the rasterized structure similarity function as a loss function, the model is further optimized by randomizing the illumination and angle of the rasterized environment.

The principle of the rasterized structure similarity function is that a true value y and a predicted value y 'which are expressed by a multi-dimensional array of a three-dimensional model are respectively rasterized, and then the structural similarity of the true value y and the predicted value y' is calculated as follows:

1. randomly generating elements of a rasterized shader: the type of shader, the type and location of camera, the location of light, intensity, etc.;

2. rasterizing the input three-dimensional model Y and the output three-dimensional model Y' under the shader respectively;

3. substituting the rasterized rendering produced plane images I and I' into an SSIM (structural similarity) function to obtain a loss value;

4. the same training step can grid the three-dimensional model Y and the output three-dimensional model Y' twice or more, and reduce and average the loss value so as to accelerate the training process;

the definition code is as follows, defssim_ras_ize (y_true, y_pred):

return ssim(rasterize(y_true),rasterize(y_pred))

finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the technical solutions described in the foregoing embodiments, or that equivalents may be substituted for part of the technical features thereof. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The training method of the deep learning model based on three-dimensional model grid simplification is characterized by comprising an initial training stage, wherein the convergence of the model is accelerated by using a three-dimensional Jacar similarity coefficient as a loss function; after the model converges, switching to a grid structure similarity function as a loss function to train; in the process of training by using the similarity function of the grid structure, the model is further optimized by randomizing the illumination and angle of the grid environment;

the three-dimensional model grid simplified deep learning model comprises a cyclic neural network layer, a full connection layer, a 3D convolution layer, an encoder layer and a decoder layer, and an attention layer;

the circulating neural network layer is used as an input layer and is used for receiving the input of the three-dimensional model network surface data with indefinite length;

the two or more full-connection layers are arranged behind the cyclic neural network layer and are used for extracting the characteristics and connectivity of the three-dimensional model network surface data input in the previous layer;

the data is deformed after the full connection layer, and the data is used for carrying out affine transformation of a three-dimensional space on the data input by the previous full connection layer;

the three-dimensional convolution layer is sequentially used for downsampling the input, and extracting the external characteristics of the input;

the 3D convolution layer is followed by an encoder layer and a decoder layer and is used for carrying out implicit coding and decoding on the input three-dimensional model;

the encoder layer and the decoder layer are followed by two or more 3D convolution layers, the 3D convolution layers sequentially oversample input, and build and refine outline features;

the 3D convolution layer is followed by two or more full connection layers for carrying out translation precoding on the simplified model;

the attention layer is arranged behind the full-connection layer and is used for intersecting the full-connection layers before and after coding and decoding, and controlling the vertex output of the three-dimensional model according to training weights so as to simplify the model;

the attention layer is followed by a cyclic neural network layer, the cyclic neural network layer is used as an output layer, and the simplified model is output in the form of a floating point array of L. 3*3.

2. The training method of a three-dimensional model mesh-based simplified deep learning model according to claim 1, wherein the code of the rasterized structure similarity function is:

def ssim_rasterize(y_true,y_pred):

return ssim(rasterize(y_true),rasterize(y_pred))。

3. the training method of a deep learning model based on three-dimensional model mesh simplification according to claim 2, wherein the rasterized structure similarity function is a structure similarity obtained by respectively rasterizing a true value y and a predicted value y' of a multi-dimensional array representation of the three-dimensional model.

4. A training method of a deep learning model based on three-dimensional model mesh simplification according to claim 3, characterized in that the step of calculating the rasterized structure similarity function is to randomly generate each element of a rasterized shader: the shader type, camera type and position, light position, and intensity; rasterizing the input three-dimensional model Y and the output three-dimensional model Y' under the shader respectively; substituting the rasterized rendering produced plane images I and I' into a structural similarity function to obtain a loss value of the plane images; the same training step can grid the three-dimensional model Y and the output three-dimensional model Y' twice or more, reduce and average the loss value and accelerate the training process.

5. A training method for a three-dimensional model based on mesh reduction of deep learning model according to claim 1, characterized in that, before the initial training phase starts, a data sample file is preprocessed, the data sample file is converted into a data form of an L x 3*3 floating point number array, the single model is normalized, and the center of the single model is placed in a three-dimensional coordinate origin point to scale the single model to a data space with variance of 1 and 0.