WO2022236802A1 - Method and apparatus for reconstructing object model, and terminal device and storage medium - Google Patents

Method and apparatus for reconstructing object model, and terminal device and storage medium Download PDF

Info

Publication number
WO2022236802A1
WO2022236802A1 PCT/CN2021/093783 CN2021093783W WO2022236802A1 WO 2022236802 A1 WO2022236802 A1 WO 2022236802A1 CN 2021093783 W CN2021093783 W CN 2021093783W WO 2022236802 A1 WO2022236802 A1 WO 2022236802A1
Authority
WO
WIPO (PCT)
Prior art keywords
vertices
target
matrix
vertex
feature
Prior art date
Application number
PCT/CN2021/093783
Other languages
French (fr)
Chinese (zh)
Inventor
王磊
钟宏亮
林佩珍
程俊
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2021/093783 priority Critical patent/WO2022236802A1/en
Publication of WO2022236802A1 publication Critical patent/WO2022236802A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • the present application relates to the technical field of image processing, and in particular to a reconstruction method, device, terminal device and storage medium of an object model.
  • the 3D model reconstruction technology of indoor scenes has great application value in the fields of virtual reality and human-computer interaction.
  • a monocular 3D object model reconstruction method based on deep learning is usually used, which generally uses an end-to-end encoder-decoder structure operation model.
  • end-to-end encoder-decoder structure operation model usually only the global information of the object image and the feature information of the vertex are considered, which will make the surface of the reconstructed 3D model of the object appear unnaturally convex or The sag phenomenon leads to poor reconstruction effect of the 3D model.
  • the embodiment of the present application provides a reconstruction method, device, terminal device and storage medium of an object model, which can avoid unnatural protrusions or depressions on the surface of the reconstructed three-dimensional model of the object, and improve the accuracy of the three-dimensional model. Rebuild effect.
  • the first aspect of the embodiments of the present application provides a method for reconstructing an object model, including:
  • the grid template including the initial position coordinates of each vertex of the original three-dimensional model and connection relationship data between the various vertices;
  • the encoding network being a neural network for extracting image features
  • the first feature matrix includes target feature vectors corresponding to each of the vertices
  • the first feature matrix is input into a pre-built decoding network for processing, and the second feature matrix is output.
  • the second feature matrix includes the target position coordinates corresponding to each of the vertices.
  • the decoding network is composed of a fully connected layer and a The neural network of the attention mechanism layer, the attention mechanism layer is used to fuse the targets corresponding to each of the vertices respectively according to the correlation between each of the vertices and the vertices for each of the vertices A feature vector is obtained to obtain a fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex;
  • a target three-dimensional model corresponding to the target object is reconstructed according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
  • the original image containing the target object and a preset grid template are first obtained, the feature vector of the original image is extracted, and then the feature vector is fused with the position coordinates of each vertex in the grid template, Get the feature matrix. Then, the feature matrix is processed by the decoding network, and the attention mechanism is introduced during decoding to consider the position correlation between each vertex of the object, and the target position coordinates of each vertex after decoding are obtained. Finally, according to the acquired target position coordinates of each vertex and the previously acquired connection relationship data between each vertex, the 3D model corresponding to the target object is reconstructed.
  • the above process performs the fusion of feature vectors according to the correlation of the position coordinates between the vertices of the object, which can consider the mutual influence relationship between the vertices of the object, so as to avoid unnatural protrusions or depressions on the surface of the reconstructed 3D model of the object phenomenon, and improve the reconstruction effect of the 3D model.
  • the fusion of the initial feature vector and the initial position coordinates of the respective vertices may further include:
  • the fusion of the initial feature vector and the initial position coordinates of each vertex may specifically be:
  • the spliced feature vectors are fused with the initial position coordinates of each vertex.
  • the merging of the initial feature vector and the initial position coordinates of each vertex to obtain the first feature matrix may include:
  • the decoding network includes a plurality of cascaded decoding modules, each of which includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in turn, and the The first feature matrix is input to the pre-built decoding network for processing, and the second feature matrix is output, which may include:
  • the first intermediate matrix includes target feature vectors corresponding to each of the vertices
  • the first intermediate matrix is input to the attention mechanism layer of the first decoding module for processing
  • the second intermediate matrix is output matrix, which can include:
  • the correlation weights between each of the vertices and the vertices are calculated according to the trainable weight matrix, and then the target feature vectors corresponding to each of the vertices are respectively corresponding to The correlation weights are weighted and summed to obtain the fused target feature vector corresponding to the vertex, and the second intermediate matrix is a matrix composed of the fused target feature vectors corresponding to each of the vertices.
  • the target three-dimensional model corresponding to the target object may further include:
  • the smoothing loss is calculated according to the sizes of all the dihedral angles, which may specifically be:
  • the smoothing loss is calculated by the following formula:
  • L smooth represents the smoothing loss
  • ⁇ i, j represents the dihedral angle between any two planes i, j of the target 3D model
  • F represents all planes of the target 3D model.
  • the second aspect of the embodiment of the present application provides an object model reconstruction device, including:
  • a data acquisition module configured to acquire a preset grid template and an original image containing the target object, the grid template including the initial position coordinates of each vertex of the original three-dimensional model and the connection relationship data between the various vertices;
  • a feature encoding module configured to input the original image into a pre-built encoding network for processing, and output an initial feature vector corresponding to the original image, and the encoding network is a neural network for extracting image features;
  • a vector fusion module configured to fuse the initial feature vector and the initial position coordinates of each vertex to obtain a first feature matrix, the first feature matrix includes target feature vectors corresponding to each of the vertices;
  • a feature decoding module configured to input the first feature matrix into a pre-built decoding network for processing, and output a second feature matrix, the second feature matrix includes target position coordinates corresponding to each of the vertices, and the decoding network It is a neural network comprising a fully connected layer and an attention mechanism layer, and the attention mechanism layer is used to fuse each of the vertices according to the correlation between each of the vertices and the vertices for each of the vertices.
  • the target feature vector corresponding to the vertex respectively, obtains the fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex;
  • the model reconstruction module is used to reconstruct the target three-dimensional model corresponding to the target object according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
  • the third aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program
  • the object model reconstruction method provided in the first aspect of the embodiment of the present application is implemented.
  • the fourth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the method provided in the first aspect of the embodiment of the present application.
  • the reconstruction method of the object model is not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to the object model.
  • a fifth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product is run on a terminal device, causes the terminal device to execute the object model reconstruction method described in the first aspect of the embodiments of the present application.
  • FIG. 1 is a flow chart of a method for reconstructing an object model provided in an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an encoding network provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a residual module provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a decoding network provided by an embodiment of the present application.
  • Fig. 5 is a schematic diagram of the processing of the attention mechanism layer provided by the embodiment of the present application.
  • Fig. 6 is a schematic diagram of the operation of the object model reconstruction method provided by the embodiment of the present application.
  • Fig. 7 is a schematic diagram of the processing effect of the object model reconstruction method provided by the embodiment of the present application.
  • Fig. 8 is a comparison diagram of the 3D model reconstruction results obtained by the present application and the Total3D original model in the prior art
  • FIG. 9 is a structural diagram of an object model reconstruction device provided in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a terminal device provided by an embodiment of the present application.
  • the present application proposes a reconstruction method, device, terminal equipment and storage medium of an object model, which can avoid unnatural protrusions or depressions on the surface of the reconstructed three-dimensional model of the object, and improve the reconstruction effect of the three-dimensional model. It should be understood that various method embodiments of the present application are executed by various types of terminal devices or servers, such as mobile phones, tablet computers, notebook computers, desktop computers, and wearable devices.
  • FIG. 1 shows a method for reconstructing an object model provided by an embodiment of the present application, including:
  • the grid template includes the initial position coordinates of each vertex of the original three-dimensional model and connection relationship data between the various vertexes.
  • the grid template can be a Mesh file, which stores the vertex positions and the connection relationship between vertices of the original 3D model.
  • the original 3D model can be a model of various shapes such as a sphere, a cube, and a cuboid. To make the distribution of each vertex position relatively uniform, it is generally recommended to use the original 3D model in the shape of a sphere.
  • the grid template includes the 3D position coordinates of each of the N vertices and the connection relationship data between the N vertices, and the N vertices can be determined according to the connection relationship data How are they connected, so that the corresponding 3D model can be obtained.
  • an original image containing a target object which is any type of object whose corresponding 3D model needs to be reconstructed, such as a sofa, a table, or a bed.
  • the original image may be an RGB image or a grayscale image of the target object.
  • the original image is input into a pre-built encoding network for processing to obtain the feature vector corresponding to the original image.
  • the encoding network is a neural network for extracting image features.
  • images are processed through convolutional layers, pooling layers, and fully connected layers to extract image features and obtain corresponding feature vectors. This application does not The type and structure of the network are defined.
  • FIG. 2 A schematic diagram of the structure of a coding network provided in the embodiment of the present application is shown in Figure 2.
  • the input original image with a dimension of 224*224*3 passes through several convolutional layers, ReLU activation function layers, and maximum pooling layers of the coding network.
  • the feature data of 1*1*1024 is finally obtained, which can be regarded as a vector of 1024 elements, that is, the initial feature vector corresponding to the original image of 224*224*3 .
  • multiple stacked residual modules can also be added to the encoding network structure shown in Figure 2, where the structure of each residual module
  • the schematic diagram is shown in Figure 3.
  • the input feature map is processed by two 3*3 convolution blocks with edge padding. After extracting local features, features are integrated and screened through the pooling layer to reduce the dimension of image features. The output of each residual module is added to its original input to form a new data transmission path, which endows the residual network with the ability of identity mapping.
  • the residual network model ResNet-18 and its pre-trained weights provided by the PyTorch framework can be used as the encoding network.
  • the feature vector is fused with the initial position coordinates of each vertex in the grid template to obtain a first feature matrix, and the first feature matrix includes target feature vectors corresponding to each of the vertices.
  • the initial position coordinates (x, y, z) of a vertex can be regarded as a vector of 3 elements, so the vector of 3 elements and the initial feature vector can be fused in a splicing manner to obtain a new vector, that is, the target feature vector.
  • Each target feature vector corresponding to each different vertex may form a matrix, that is, the first feature matrix.
  • the merging of the initial feature vector and the initial position coordinates of each vertex to obtain the first feature matrix may include:
  • the initial position coordinates of each vertex are expressed as a vector of 3 elements
  • the initial position coordinates of N vertices can be expressed as a matrix of N*3
  • the number of elements of the initial feature vector is assumed to be X
  • an N*(X+3) matrix will be obtained after splicing in the second dimension as the first feature matrix.
  • the fusion of the initial feature vector and the initial position coordinates of the respective vertices may further include:
  • the fusion of the initial feature vector and the initial position coordinates of each vertex may specifically be:
  • the spliced feature vectors are fused with the initial position coordinates of each vertex.
  • a certain type vector can also be combined with the The initial feature vectors are concatenated, and then the concatenated vectors are fused with the initial position coordinates.
  • each object category corresponds to a unique category vector, so the category vector can be in the form of one-hot encoding.
  • the category vector can be in the form of one-hot encoding. For example, if there are images of 4 types of objects in the data set to be processed, which are tables, chairs, computers, and airplanes, you can pre-set the category vector corresponding to the table as (0, 0, 0, 1), and the category corresponding to the chair The vector is (0, 0, 1, 0), the category vector corresponding to the computer is (0, 1, 0, 0), and the category vector corresponding to the aircraft is (1, 0, 0, 0). If the current processed original image The target object in is a table, then the category vector (0, 0, 0, 1) corresponding to the table is obtained at this time, and spliced with the initial feature vector.
  • An example to illustrate the specific splicing method is as follows: Assume that there are 2562 vertices in total, and the initial position coordinates of each vertex are expressed as a vector of 3 elements, then the initial position coordinates of 2562 vertices can be expressed as a 2562*3 matrix.
  • the number of elements of the initial feature vector is 1024, and the number of elements of the category vector is 9.
  • the initial feature vector and the category vector are spliced to obtain a new feature vector with the number of elements of 1033, and then the new feature vector is combined with the 2562*
  • the second dimension of the matrix of 3 is concatenated to obtain a 2562*1036 matrix as the first feature matrix.
  • Each 1*1036 vector in the first feature matrix is a semantic vector corresponding to each model vertex.
  • the decoding network is a neural network including a fully connected layer and an attention mechanism layer, and the attention mechanism layer is used for each of the vertices according to the correlation between each of the vertices and the vertices.
  • the target feature vectors corresponding to each of the vertices are fused to obtain a fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex.
  • Ordinary decoding networks usually use a multi-layer stacked fully connected network to predict the offset of the vertex coordinates of the grid template, and obtain the converted target position coordinates.
  • this method can only consider the global information of the image and the information of a single target point when predicting, and does not consider the related points of the target point, especially the mutual influence between local adjacent points, which will easily lead to the reconstruction of the 3D model Unnatural bumps or depressions appear on the surface.
  • this application adds an attention mechanism layer to the decoding network to capture the positional interaction between different vertices of the same object.
  • the decoding network includes a plurality of cascaded decoding modules, each of which includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in turn, and the The first feature matrix is input to the pre-built decoding network for processing, and the second feature matrix is output, which may include:
  • the decoding network includes multiple stacked decoding modules, where each decoding module is sequentially composed of a fully connected layer, an attention mechanism layer and a batch normalization layer.
  • the fully connected layer can be realized by 1*1 convolution to predict the coordinate offset of a single vertex, and then filter and extract several vertices most relevant to the current vertex (usually local adjacent vertices) through the attention mechanism layer.
  • the coordinate information of the data is spliced with the original output, and then processed by the batch normalization layer (that is, the Batch Normalization layer, also known as the batch reduction layer), so that the data conforms to the Gaussian distribution, and then put into the subsequent network.
  • the batch normalization layer that is, the Batch Normalization layer, also known as the batch reduction layer
  • the first intermediate matrix includes target feature vectors corresponding to each of the vertices
  • the first intermediate matrix is input to the attention mechanism layer of the first decoding module for processing
  • the second intermediate matrix is output matrix, which can include:
  • the correlation weights between each of the vertices and the vertices are calculated according to the trainable weight matrix, and then the target feature vectors corresponding to each of the vertices are respectively corresponding to The correlation weights are weighted and summed to obtain the fused target feature vector corresponding to the vertex, and the second intermediate matrix is a matrix composed of the fused target feature vectors corresponding to each of the vertices.
  • FIG. 5 it is a schematic diagram of the processing of the attention mechanism layer adopted in this application.
  • the first intermediate matrix I ⁇ R N*C is obtained, where N represents the number of vertices, and C represents the number of elements of the target feature vector corresponding to each vertex number.
  • the second intermediate matrix A ⁇ R N*C is obtained, and then the two matrices are spliced in the second dimension to obtain the third intermediate matrix O ⁇ R N*2C .
  • the third intermediate matrix O is input to the batch normalization layer for processing, and then connected to the next decoding module to perform the same processing, and so on, and finally output the second feature matrix.
  • This process can be called point-to-point attention mechanism processing.
  • the specific processing method is: for a certain vertex P, a trainable weight matrix is used to calculate the N-1 vertices (excluding vertex P) and vertex The correlation weight between P, and then the target feature vectors corresponding to the N-1 vertices are weighted and summed according to their corresponding correlation weights, and the fused target feature vector corresponding to the vertex P is obtained.
  • the dimensionality of the feature vector is unchanged (dimension is still C).
  • N fused target feature vectors will be obtained, which form the second intermediate matrix A ⁇ R N*C .
  • e i, j represent the correlation weight between any two vertices i and j in the Nth vertex
  • p i represents the target feature vector corresponding to vertex i
  • p j represents the target feature vector corresponding to vertex j
  • W is a trainable weight matrix
  • the initial value of the weight matrix can be manually set, and then the value of the weight matrix is iteratively updated during the training process of the decoding network.
  • the weight matrix W is a matrix of 1036*1036, so that the calculated correlation weight will be a value, indicating the correlation between vertices i and j.
  • a i represents e i after softmax reduction
  • e i is a vector obtained by concatenating e i and j according to the jth dimension, which represents the correlation weight between all other vertices except vertex i and vertex i.
  • the fused target feature vector corresponding to vertex i can be expressed by the following formula (1.3):
  • a i represents the fused target feature vector corresponding to vertex i
  • a i,j represents the reduced correlation weight between vertex j and vertex i.
  • each stacked decoding module in the decoding network will gradually perform a dimension reduction operation on the matrix (realized by a fully connected layer), and finally obtain a 2562
  • the result matrix of *3 represents the converted three-dimensional position coordinates corresponding to 2562 vertices respectively.
  • the position of each vertex in the reconstructed 3D model can be determined, and then combined with the connection relationship data between each vertex contained in the grid template, a new 3D model can be constructed. model, as the target three-dimensional model corresponding to the target object.
  • the target three-dimensional model corresponding to the target object may also include:
  • the size of each dihedral angle of the target three-dimensional model can be easily calculated because the coordinates of each vertex and the connection relationship between the vertices are known. Then, the smoothing loss can be calculated according to the size of all dihedral angles, and the smoothing loss is used as the objective function to optimize and update the parameters of the decoding network.
  • the smoothing loss is calculated according to the sizes of all the dihedral angles, which may specifically be:
  • L smooth represents the smoothing loss
  • ⁇ i,j represents the dihedral angle between any two planes of the target 3D model
  • F represents all planes of the target 3D model.
  • the embodiment of the present application introduces a smoothing loss to train the neural network and constrains the smoothness of the surface of the object, which can make the surface of the reconstructed 3D model smoother and improve the effect of model reconstruction.
  • the original image containing the target object and a preset grid template are first obtained, the feature vector of the original image is extracted, and then the feature vector is fused with the position coordinates of each vertex in the grid template, Get the feature matrix. Then, the feature matrix is processed by the decoding network, and the attention mechanism is introduced during decoding to consider the position correlation between each vertex of the object, and the target position coordinates of each vertex after decoding are obtained. Finally, according to the acquired target position coordinates of each vertex and the previously acquired connection relationship data between each vertex, the 3D model corresponding to the target object is reconstructed.
  • the above process performs the fusion of feature vectors according to the correlation of the position coordinates between the vertices of the object, which can consider the mutual influence relationship between the vertices of the object, so as to avoid unnatural protrusions or depressions on the surface of the reconstructed 3D model of the object phenomenon, and improve the reconstruction effect of the 3D model.
  • FIG. 6 it is a schematic diagram of the operation of the object model reconstruction method provided by the embodiment of the present application.
  • Each decoding module includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in sequence.
  • the attention mechanism is used to obtain the target position coordinates of each vertex after conversion; finally, according to the target position coordinates corresponding to each vertex and the connection relationship data between each vertex, the 3D model corresponding to the target object is reconstructed .
  • the smoothing loss can be calculated according to each dihedral angle in the reconstructed 3D model, and the decoding network can be optimized and trained according to the smoothing loss, so as to improve the smoothness of the surface of the obtained 3D model.
  • FIG. 7 it is a schematic diagram of the processing effect of the object model reconstruction method proposed in this application.
  • the five 3D models at the top of Figure 7 are reconstructed three-dimensional models not obtained by using the inter-point attention mechanism
  • the five three-dimensional models at the bottom of Figure 7 are corresponding reconstructed three-dimensional models obtained by using the inter-point attention mechanism.
  • there are many unnatural protrusions and depressions in the five 3D models at the top of Figure 7 see the dotted line box in the figure), but these protrusions do not exist in the five 3D models at the bottom of Figure 7 and depressions, the reconstruction of the 3D model is better.
  • Fig. 8 is a comparison diagram of the 3D model reconstruction results obtained by the original Total3D model in the present application and the prior art, wherein the left column is the input image, the middle column is the 3D model reconstruction result obtained by using the Total3D original model, and the right side A column of is the reconstruction result of the 3D model obtained by this application. It can be seen that a more accurate and smooth three-dimensional object model can be generated by using the calculation model proposed in this application.
  • sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application .
  • the above mainly describes a method for reconstructing an object model, and a device for reconstructing an object model will be described below.
  • an embodiment of an object model reconstruction device in the embodiment of the present application includes:
  • a data acquisition module 801 configured to acquire a preset grid template and an original image containing the target object, the grid template including the initial position coordinates of each vertex of the original 3D model and the connection relationship data between the various vertices;
  • a feature encoding module 802 configured to input the original image into a pre-built encoding network for processing, and output an initial feature vector corresponding to the original image, and the encoding network is a neural network for extracting image features;
  • a vector fusion module 803, configured to fuse the initial feature vectors and the initial position coordinates of the respective vertices to obtain a first feature matrix, the first feature matrix including target feature vectors corresponding to each of the vertices;
  • a feature decoding module 804 configured to input the first feature matrix into a pre-built decoding network for processing, and output a second feature matrix, the second feature matrix includes target position coordinates corresponding to each of the vertices, and the decoding
  • the network is a neural network including a fully connected layer and an attention mechanism layer, and the attention mechanism layer is used to fuse each vertex according to the correlation between each vertex and the vertex for each vertex in the vertex.
  • the target feature vectors corresponding to the vertices respectively are obtained to obtain the fused target feature vectors corresponding to the vertexes, and the fused target feature vectors are used to determine the target position coordinates corresponding to the vertexes;
  • the model reconstruction module 805 is configured to reconstruct the target three-dimensional model corresponding to the target object according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
  • the reconstruction device of the object model may also include:
  • a category vector acquisition module configured to acquire a category vector corresponding to the target object, where the category vector is used to represent the object category to which the target object belongs;
  • a vector splicing module configured to splice the category vector and the initial feature vector to obtain a spliced feature vector
  • the vector fusion module can specifically be used for:
  • the spliced feature vectors are fused with the initial position coordinates of each vertex.
  • the vector fusion module may include:
  • a matrix representation unit configured to represent the initial position coordinates of each vertex as a matrix of dimension N*3, where N is the number of each vertex;
  • a vector splicing unit configured to splice the initial feature vector and the matrix of dimension N*3 on the second dimension to obtain the first feature matrix of dimension N*(3+X), where X is the initial The number of elements in the eigenvector.
  • the decoding network includes a plurality of cascaded decoding modules, each of which includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in turn, and the feature decoding module Can include:
  • a first processing unit configured to input the first feature matrix into the fully connected layer of the first decoding module of the decoding network for processing, and output a first intermediate matrix
  • the second processing unit is configured to input the first intermediate matrix into the attention mechanism layer of the first decoding module for processing, and output a second intermediate matrix;
  • a third processing unit configured to splice the second intermediate matrix and the first intermediate matrix to obtain a third intermediate matrix
  • a fourth processing unit configured to input the third intermediate matrix into the batch normalization layer of the first decoding module for processing to obtain a fourth intermediate matrix
  • the fifth processing unit is configured to input the fourth intermediate matrix into the second decoding module of the decoding network, and continue to use the same processing method as that of the first decoding module until the final result obtained by the decoding network is obtained.
  • the second feature matrix output by a decoding module is configured to input the fourth intermediate matrix into the second decoding module of the decoding network, and continue to use the same processing method as that of the first decoding module until the final result obtained by the decoding network is obtained.
  • the first intermediate matrix includes target feature vectors corresponding to each of the vertices
  • the second processing unit may specifically be used for:
  • the correlation weights between each of the vertices and the vertices are calculated according to the trainable weight matrix, and then the target feature vectors corresponding to each of the vertices are respectively corresponding to The correlation weights are weighted and summed to obtain the fused target feature vector corresponding to the vertex, and the second intermediate matrix is a matrix composed of the fused target feature vectors corresponding to each of the vertices.
  • the reconstruction device of the object model may also include:
  • a dihedral angle calculation module configured to calculate the size of all dihedral angles of the target three-dimensional model according to the position coordinates of each vertex of the target three-dimensional model
  • a smoothing loss calculation module configured to calculate the smoothing loss according to the size of all dihedral angles
  • a network parameter optimization module configured to optimize and update parameters of the decoding network based on the smoothing loss.
  • smoothing loss calculation module is specifically used for:
  • the smoothing loss is calculated by the following formula:
  • L smooth represents the smoothing loss
  • ⁇ i, j represents the dihedral angle between any two planes i, j of the target 3D model
  • F represents all planes of the target 3D model.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, any method for reconstructing an object model as shown in FIG. 1 is implemented.
  • the embodiment of the present application also provides a computer program product, which, when the computer program product is run on a terminal device, enables the terminal device to implement any method for reconstructing an object model as shown in FIG. 1 .
  • Fig. 10 is a schematic diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device 9 of this embodiment includes: a processor 90 , a memory 91 , and a computer program 92 stored in the memory 91 and operable on the processor 90 .
  • the processor 90 executes the computer program 92, it implements the steps in the embodiments of the reconstruction method of each object model mentioned above, such as steps 101 to 105 shown in FIG. 1 .
  • the processor 90 executes the computer program 92, it realizes the functions of the modules/units in the above-mentioned device embodiments, for example, the functions of the modules 801 to 805 shown in FIG. 9 .
  • the computer program 92 can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 91 and executed by the processor 90 to complete the present application.
  • the one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 92 in the terminal device 9 .
  • the so-called processor 90 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the storage 91 may be an internal storage unit of the terminal device 9 , such as a hard disk or memory of the terminal device 9 .
  • the memory 91 can also be an external storage device of the terminal device 9, such as a plug-in hard disk equipped on the terminal device 9, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 91 may also include both an internal storage unit of the terminal device 9 and an external storage device.
  • the memory 91 is used to store the computer program and other programs and data required by the terminal device.
  • the memory 91 can also be used to temporarily store data that has been output or will be output.
  • the disclosed devices and methods may be implemented in other ways.
  • the system embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments in the present application can also be completed by instructing related hardware through computer programs.
  • the computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excluding electrical carrier signals and telecommunication signals.

Abstract

The present application relates to the technical field of image processing, and provided are a method and apparatus for reconstructing an object model, and a terminal device and a storage medium. The method comprises: firstly, acquiring an original image including a target object and a certain preset grid template, extracting a feature vector of the original image, and then fusing the feature vector and position coordinates of each vertex in the grid template, so as to obtain a feature matrix; then using a decoding network to process the feature matrix, introducing an attention mechanism during decoding, so as to take a positional correlation between the vertices of an object into consideration, so as to obtain target position coordinates of each vertex after decoding; and finally, according to the obtained target position coordinates of each vertex and previously acquired connection relationship data between the vertices, performing reconstruction to obtain a three-dimensional model corresponding to the target object. By means of the method, the occurrence of unnatural bulges or recesses on the surface of the obtained, by means of reconstruction, three-dimensional model of an object can be avoided, thereby improving the reconstruction effect of the three-dimensional model.

Description

一种物体模型的重建方法、装置、终端设备和存储介质Object model reconstruction method, device, terminal equipment and storage medium 技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种物体模型的重建方法、装置、终端设备和存储介质。The present application relates to the technical field of image processing, and in particular to a reconstruction method, device, terminal device and storage medium of an object model.
背景技术Background technique
室内场景的三维模型重建技术在虚拟现实和人机交互等领域具有巨大的应用价值。目前,通常采用基于深度学习的单目三维物体模型重建方法,该方法一般采用端到端的编码器-解码器结构的运算模型。然而,在解码器端预测物体表面的某个顶点位置分布时,通常只考量物体图像的全局信息以及该顶点的特征信息,这会使得重建得到的物体三维模型的表面出现不自然的凸起或凹陷现象,导致三维模型的重建效果变差。The 3D model reconstruction technology of indoor scenes has great application value in the fields of virtual reality and human-computer interaction. At present, a monocular 3D object model reconstruction method based on deep learning is usually used, which generally uses an end-to-end encoder-decoder structure operation model. However, when predicting the position distribution of a certain vertex on the surface of the object at the decoder side, usually only the global information of the object image and the feature information of the vertex are considered, which will make the surface of the reconstructed 3D model of the object appear unnaturally convex or The sag phenomenon leads to poor reconstruction effect of the 3D model.
技术问题technical problem
有鉴于此,本申请实施例提供了一种物体模型的重建方法、装置、终端设备和存储介质,能够避免重建得到的物体三维模型的表面出现不自然的凸起或凹陷现象,提高三维模型的重建效果。In view of this, the embodiment of the present application provides a reconstruction method, device, terminal device and storage medium of an object model, which can avoid unnatural protrusions or depressions on the surface of the reconstructed three-dimensional model of the object, and improve the accuracy of the three-dimensional model. Rebuild effect.
技术解决方案technical solution
本申请实施例的第一方面提供了一种物体模型的重建方法,包括:The first aspect of the embodiments of the present application provides a method for reconstructing an object model, including:
获取预设的网格模板以及包含目标物体的原始图像,所述网格模板包含原始三维模型的各个顶点的初始位置坐标以及所述各个顶点之间的连接关系数据;Obtaining a preset grid template and an original image containing the target object, the grid template including the initial position coordinates of each vertex of the original three-dimensional model and connection relationship data between the various vertices;
将所述原始图像输入预先构建的编码网络进行处理,输出所述原始图像对应的初始特征向量,所述编码网络为用于提取图像特征的神经网络;Inputting the original image into a pre-built encoding network for processing, outputting an initial feature vector corresponding to the original image, the encoding network being a neural network for extracting image features;
将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,所述第一特征矩阵包含各个所述顶点分别对应的目标特征向量;Fusing the initial feature vector and the initial position coordinates of each vertex to obtain a first feature matrix, the first feature matrix includes target feature vectors corresponding to each of the vertices;
将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵,所述第二特征矩阵包含各个所述顶点分别对应的目标位置坐标,所述解码网络为包含全连接层和注意力机制层的神经网络,所述注意力机制层用于针对所述各个顶点中的每个顶点,分别根据各个所述顶点与该顶点之间的相关性融合各个所述顶点分别对应的目标特征向量,得到该顶点对应的融合后的目标特征向量,所述融合后的目标特征向量用于确定该顶点对应的目标位置坐标;The first feature matrix is input into a pre-built decoding network for processing, and the second feature matrix is output. The second feature matrix includes the target position coordinates corresponding to each of the vertices. The decoding network is composed of a fully connected layer and a The neural network of the attention mechanism layer, the attention mechanism layer is used to fuse the targets corresponding to each of the vertices respectively according to the correlation between each of the vertices and the vertices for each of the vertices A feature vector is obtained to obtain a fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex;
根据各个所述顶点分别对应的目标位置坐标以及所述各个顶点之间的连接关系数据,重建得到所述目标物体对应的目标三维模型。A target three-dimensional model corresponding to the target object is reconstructed according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
本申请实施例首先获取包含目标物体的原始图像和某个预设的网格模板,提取该原始图像的特征向量,然后将该特征向量和该网格模板中具有的各个顶点的位置坐标融合,得到特征矩阵。接着,采用解码网络对该特征矩阵进行处理,在解码时引入注意力机制以考虑物体各个顶点之间的位置相关性,得到解码后的各个顶点的目标位置坐标。最后,根据获得的各个顶点的目标位置坐标以及之前获取的各个顶点之间的连接关系数据,重建得到该目标物体对应的三维模型。上述过程根据物体各个顶点之间的位置坐标的相关性执行特征向量的融合,能够考量物体各个顶点之间的相互影响关系,从而避免重建得到的物体三维模型的表面出现不自然的凸起或凹陷现象,提高三维模型的重建效果。In the embodiment of the present application, the original image containing the target object and a preset grid template are first obtained, the feature vector of the original image is extracted, and then the feature vector is fused with the position coordinates of each vertex in the grid template, Get the feature matrix. Then, the feature matrix is processed by the decoding network, and the attention mechanism is introduced during decoding to consider the position correlation between each vertex of the object, and the target position coordinates of each vertex after decoding are obtained. Finally, according to the acquired target position coordinates of each vertex and the previously acquired connection relationship data between each vertex, the 3D model corresponding to the target object is reconstructed. The above process performs the fusion of feature vectors according to the correlation of the position coordinates between the vertices of the object, which can consider the mutual influence relationship between the vertices of the object, so as to avoid unnatural protrusions or depressions on the surface of the reconstructed 3D model of the object phenomenon, and improve the reconstruction effect of the 3D model.
在本申请的一个实施例中,在将所述初始特征向量和所述各个顶点的初始位置坐标融合之前,还可以包括:In one embodiment of the present application, before the fusion of the initial feature vector and the initial position coordinates of the respective vertices, it may further include:
获取与所述目标物体对应的类别向量,所述类别向量用于表示所述目标物体所属的物体类别;Acquiring a category vector corresponding to the target object, where the category vector is used to represent the object category to which the target object belongs;
将所述类别向量和所述初始特征向量拼接,得到拼接后的特征向量;splicing the category vector and the initial feature vector to obtain a spliced feature vector;
所述将所述初始特征向量和所述各个顶点的初始位置坐标融合,具体可以为:The fusion of the initial feature vector and the initial position coordinates of each vertex may specifically be:
将所述拼接后的特征向量和所述各个顶点的初始位置坐标融合。The spliced feature vectors are fused with the initial position coordinates of each vertex.
在本申请的一个实施例中,所述将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,可以包括:In one embodiment of the present application, the merging of the initial feature vector and the initial position coordinates of each vertex to obtain the first feature matrix may include:
将所述各个顶点的初始位置坐标表示为维度N*3的矩阵,N为所述各个顶点的数量;Expressing the initial position coordinates of each vertex as a matrix of dimension N*3, N being the quantity of each vertex;
将所述初始特征向量和所述维度N*3的矩阵在第二维度上拼接,得到维度N*(3+X)的所述第一特征矩阵,X为所述初始特征向量的元素数量。Splicing the initial feature vector and the matrix of dimension N*3 in the second dimension to obtain the first feature matrix of dimension N*(3+X), where X is the number of elements of the initial feature vector.
在本申请的一个实施例中,所述解码网络包含多个级联的解码模块,每个所述解码模块依次包含全连接层、注意力机制层和批归一化层,所述将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵,可以包括:In one embodiment of the present application, the decoding network includes a plurality of cascaded decoding modules, each of which includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in turn, and the The first feature matrix is input to the pre-built decoding network for processing, and the second feature matrix is output, which may include:
将所述第一特征矩阵输入所述解码网络的第一个解码模块的全连接层进行处理,输出第一中间矩阵;Input the first feature matrix into the fully connected layer of the first decoding module of the decoding network for processing, and output the first intermediate matrix;
将所述第一中间矩阵输入所述第一个解码模块的注意力机制层进行处理,输出第二中间矩阵;Input the first intermediate matrix into the attention mechanism layer of the first decoding module for processing, and output the second intermediate matrix;
将所述第二中间矩阵和所述第一中间矩阵拼接,得到第三中间矩阵;splicing the second intermediate matrix and the first intermediate matrix to obtain a third intermediate matrix;
将所述第三中间矩阵输入所述第一个解码模块的批归一化层进行处理,得到第四中间矩阵;Inputting the third intermediate matrix into the batch normalization layer of the first decoding module for processing to obtain a fourth intermediate matrix;
将所述第四中间矩阵输入所述解码网络的第二个解码模块,继续采用与所述第一个解码模块相同的处理方式,直至获得由所述解码网络的最后一个解码模块输出的所述第二特征矩阵。Input the fourth intermediate matrix into the second decoding module of the decoding network, and continue to use the same processing method as that of the first decoding module until the output of the last decoding module of the decoding network is obtained. Second feature matrix.
进一步的,所述第一中间矩阵包含各个所述顶点分别对应的目标特征向量,所述将所述第一中间矩阵输入所述第一个解码模块的注意力机制层进行处理,输出第二中间矩阵,可以包括:Further, the first intermediate matrix includes target feature vectors corresponding to each of the vertices, the first intermediate matrix is input to the attention mechanism layer of the first decoding module for processing, and the second intermediate matrix is output matrix, which can include:
针对所述各个顶点中的每个顶点,均根据可训练的权重矩阵分别计算得到各个所述顶点与该顶点之间的相关性权重,然后将各个所述顶点分别对应的目标特征向量按照各自对应的相关性权重执行加权求和处理,得到该顶点对应的融合后的目标特征向量,所述第二中间矩阵是由各个所述顶点分别对应的融合后的目标特征向量组成的矩阵。For each of the vertices, the correlation weights between each of the vertices and the vertices are calculated according to the trainable weight matrix, and then the target feature vectors corresponding to each of the vertices are respectively corresponding to The correlation weights are weighted and summed to obtain the fused target feature vector corresponding to the vertex, and the second intermediate matrix is a matrix composed of the fused target feature vectors corresponding to each of the vertices.
在本申请的一个实施例中,在重建得到所述目标物体对应的目标三维模型之后,还可以包括:In one embodiment of the present application, after the target three-dimensional model corresponding to the target object is reconstructed, it may further include:
根据所述目标三维模型的各个顶点的位置坐标计算得到所述目标三维模型具有的所有二面角的大小;calculating the size of all dihedral angles of the target three-dimensional model according to the position coordinates of each vertex of the target three-dimensional model;
根据所述所有二面角的大小计算得到平滑损失;Calculate the smoothing loss according to the size of all dihedral angles;
基于所述平滑损失对所述解码网络的参数进行优化更新。Optimizing and updating parameters of the decoding network based on the smoothing loss.
进一步的,所述根据所述所有二面角的大小计算得到平滑损失,具体可以为:Further, the smoothing loss is calculated according to the sizes of all the dihedral angles, which may specifically be:
采用以下公式计算得到所述平滑损失:The smoothing loss is calculated by the following formula:
Figure PCTCN2021093783-appb-000001
Figure PCTCN2021093783-appb-000001
其中,L smooth表示所述平滑损失,θ i,j表示所述目标三维模型具有的任意两个平面i,j之间的二面角,F表示所述目标三维模型具有的所有平面。 Wherein, L smooth represents the smoothing loss, θ i, j represents the dihedral angle between any two planes i, j of the target 3D model, and F represents all planes of the target 3D model.
本申请实施例的第二方面提供了一种物体模型的重建装置,包括:The second aspect of the embodiment of the present application provides an object model reconstruction device, including:
数据获取模块,用于获取预设的网格模板以及包含目标物体的原始图像,所述网格模板包含原始三维模型的各个顶点的初始位置坐标以及所述各个顶点之间的连接关系数据;A data acquisition module, configured to acquire a preset grid template and an original image containing the target object, the grid template including the initial position coordinates of each vertex of the original three-dimensional model and the connection relationship data between the various vertices;
特征编码模块,用于将所述原始图像输入预先构建的编码网络进行处理,输出所述原始图像对应的初始特征向量,所述编码网络为用于提取图像特征的神经网络;A feature encoding module, configured to input the original image into a pre-built encoding network for processing, and output an initial feature vector corresponding to the original image, and the encoding network is a neural network for extracting image features;
向量融合模块,用于将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,所述第一特征矩阵包含各个所述顶点分别对应的目标特征向量;A vector fusion module, configured to fuse the initial feature vector and the initial position coordinates of each vertex to obtain a first feature matrix, the first feature matrix includes target feature vectors corresponding to each of the vertices;
特征解码模块,用于将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二 特征矩阵,所述第二特征矩阵包含各个所述顶点分别对应的目标位置坐标,所述解码网络为包含全连接层和注意力机制层的神经网络,所述注意力机制层用于针对所述各个顶点中的每个顶点,分别根据各个所述顶点与该顶点之间的相关性融合各个所述顶点分别对应的目标特征向量,得到该顶点对应的融合后的目标特征向量,所述融合后的目标特征向量用于确定该顶点对应的目标位置坐标;A feature decoding module, configured to input the first feature matrix into a pre-built decoding network for processing, and output a second feature matrix, the second feature matrix includes target position coordinates corresponding to each of the vertices, and the decoding network It is a neural network comprising a fully connected layer and an attention mechanism layer, and the attention mechanism layer is used to fuse each of the vertices according to the correlation between each of the vertices and the vertices for each of the vertices. The target feature vector corresponding to the vertex respectively, obtains the fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex;
模型重建模块,用于根据各个所述顶点分别对应的目标位置坐标以及所述各个顶点之间的连接关系数据,重建得到所述目标物体对应的目标三维模型。The model reconstruction module is used to reconstruct the target three-dimensional model corresponding to the target object according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
本申请实施例的第三方面提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本申请实施例的第一方面提供的物体模型的重建方法。The third aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program The object model reconstruction method provided in the first aspect of the embodiment of the present application is implemented.
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如本申请实施例的第一方面提供的物体模型的重建方法。The fourth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the method provided in the first aspect of the embodiment of the present application. The reconstruction method of the object model.
本申请实施例的第五方面提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行本申请实施例的第一方面所述的物体模型的重建方法。A fifth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product is run on a terminal device, causes the terminal device to execute the object model reconstruction method described in the first aspect of the embodiments of the present application.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the above-mentioned second aspect to the fifth aspect, reference can be made to the relevant description in the above-mentioned first aspect, and details will not be repeated here.
附图说明Description of drawings
图1是本申请实施例提供的一种物体模型的重建方法的流程图;FIG. 1 is a flow chart of a method for reconstructing an object model provided in an embodiment of the present application;
图2是本申请实施例提供的一种编码网络的结构示意图;FIG. 2 is a schematic structural diagram of an encoding network provided by an embodiment of the present application;
图3是本申请实施例提供的一种残差模块的结构示意图;FIG. 3 is a schematic structural diagram of a residual module provided by an embodiment of the present application;
图4是本申请实施例提供的一种解码网络的结构示意图;FIG. 4 is a schematic structural diagram of a decoding network provided by an embodiment of the present application;
图5是本申请实施例提供的注意力机制层的处理示意图;Fig. 5 is a schematic diagram of the processing of the attention mechanism layer provided by the embodiment of the present application;
图6是本申请实施例提供的物体模型的重建方法的一种操作示意图;Fig. 6 is a schematic diagram of the operation of the object model reconstruction method provided by the embodiment of the present application;
图7是本申请实施例提供的物体模型的重建方法的处理效果示意图;Fig. 7 is a schematic diagram of the processing effect of the object model reconstruction method provided by the embodiment of the present application;
图8是本申请与现有技术中的Total3D原模型获得的三维模型重建结果的对比图;Fig. 8 is a comparison diagram of the 3D model reconstruction results obtained by the present application and the Total3D original model in the prior art;
图9是本申请实施例提供的一种物体模型的重建装置的结构图;FIG. 9 is a structural diagram of an object model reconstruction device provided in an embodiment of the present application;
图10是本申请实施例提供的一种终端设备的示意图。FIG. 10 is a schematic diagram of a terminal device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细 节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In the following description, for the purpose of illustration rather than limitation, specific details such as specific system structures and technologies are presented, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail. In addition, in the description of the specification and appended claims of the present application, the terms "first", "second", "third" and so on are only used to distinguish descriptions, and should not be understood as indicating or implying relative importance.
本申请提出一种物体模型的重建方法、装置、终端设备和存储介质,能够避免重建得到的物体三维模型的表面出现不自然的凸起或凹陷现象,提高三维模型的重建效果。应当理解,本申请各个方法实施例的执行主体为各种类型的终端设备或服务器,比如手机、平板电脑、笔记本电脑、台式电脑和可穿戴设备等。The present application proposes a reconstruction method, device, terminal equipment and storage medium of an object model, which can avoid unnatural protrusions or depressions on the surface of the reconstructed three-dimensional model of the object, and improve the reconstruction effect of the three-dimensional model. It should be understood that various method embodiments of the present application are executed by various types of terminal devices or servers, such as mobile phones, tablet computers, notebook computers, desktop computers, and wearable devices.
请参阅图1,示出了本申请实施例提供的一种物体模型的重建方法,包括:Please refer to FIG. 1, which shows a method for reconstructing an object model provided by an embodiment of the present application, including:
101、获取预设的网格模板以及包含目标物体的原始图像;101. Obtain a preset grid template and an original image containing a target object;
首先,获取预设的一个网格模板。该网格模板包含原始三维模型的各个顶点的初始位置坐标以及所述各个顶点之间的连接关系数据。例如,该网格模板可以是一个Mesh文件,该文件存储了原始三维模型的顶点位置以及顶点之间的连接关系,该原始三维模型可以是球体、立方体和长方体等各种形状的模型,而为了使各个顶点位置的分布比较均匀,一般建议采用球体形状的原始三维模型。假设该原始三维模型具有N个顶点,则该网格模板包含该N个顶点中每个顶点的三维位置坐标以及该N个顶点之间的连接关系数据,根据连接关系数据可以确定这N个顶点之间是如何连接的,从而可以获得对应的三维模型。First, grab one of the preset grid templates. The grid template includes the initial position coordinates of each vertex of the original three-dimensional model and connection relationship data between the various vertexes. For example, the grid template can be a Mesh file, which stores the vertex positions and the connection relationship between vertices of the original 3D model. The original 3D model can be a model of various shapes such as a sphere, a cube, and a cuboid. To make the distribution of each vertex position relatively uniform, it is generally recommended to use the original 3D model in the shape of a sphere. Assuming that the original 3D model has N vertices, the grid template includes the 3D position coordinates of each of the N vertices and the connection relationship data between the N vertices, and the N vertices can be determined according to the connection relationship data How are they connected, so that the corresponding 3D model can be obtained.
另外,还需要获取一幅包含目标物体的原始图像,该目标物体是需要重建出对应三维模型的任意类型的物体,例如可以是一个沙发、一个桌子或者一张床等。该原始图像具体可以是该目标物体的RGB图像或者灰度图像。In addition, it is also necessary to obtain an original image containing a target object, which is any type of object whose corresponding 3D model needs to be reconstructed, such as a sofa, a table, or a bed. Specifically, the original image may be an RGB image or a grayscale image of the target object.
102、将所述原始图像输入预先构建的编码网络进行处理,输出所述原始图像对应的初始特征向量;102. Input the original image into a pre-built encoding network for processing, and output an initial feature vector corresponding to the original image;
在获取到该原始图像之后,将该原始图像输入一个预先构建的编码网络进行处理,得到该原始图像对应的特征向量。其中,该编码网络是一个用于提取图像特征的神经网络,一般通过卷积层、池化层和全连接层等对图像进行处理以提取图像特征,获得对应的特征向量,本申请不对该神经网络的类型和结构进行限定。After the original image is acquired, the original image is input into a pre-built encoding network for processing to obtain the feature vector corresponding to the original image. Among them, the encoding network is a neural network for extracting image features. Generally, images are processed through convolutional layers, pooling layers, and fully connected layers to extract image features and obtain corresponding feature vectors. This application does not The type and structure of the network are defined.
本申请实施例提供的一种编码网络的结构示意图如图2所示,输入的维度为224*224*3的原始图像经过该编码网络的若干卷积层、ReLU激活函数层、最大池化层和全连接层等网络层处理后,最终得到1*1*1024的特征数据,该特征数据可以视作一个1024个元素的向量, 也即该224*224*3的原始图像对应的初始特征向量。另外,为了避免过深的模型结构导致的图像特征的梯度爆炸或者梯度消失现象,还可以在图2所示的编码网络结构中添加多个堆叠的残差模块,其中每个残差模块的结构示意图如图3所示,输入的特征图通过两个带边缘填充的3*3卷积块处理,提取局部特征后,经由池化层进行特征的整合与筛选,缩小图像特征的维度。每个残差模块的输出都会与其原始输入相加,构成一条新的数据传输路径,该路径赋予残差网络进行恒等映射的能力。在实际应用中,可以使用PyTorch框架提供的残差网络模型ResNet-18及其预训练权重作为编码网络。A schematic diagram of the structure of a coding network provided in the embodiment of the present application is shown in Figure 2. The input original image with a dimension of 224*224*3 passes through several convolutional layers, ReLU activation function layers, and maximum pooling layers of the coding network. After processing with the network layer such as the fully connected layer, the feature data of 1*1*1024 is finally obtained, which can be regarded as a vector of 1024 elements, that is, the initial feature vector corresponding to the original image of 224*224*3 . In addition, in order to avoid the gradient explosion or gradient disappearance of image features caused by an overly deep model structure, multiple stacked residual modules can also be added to the encoding network structure shown in Figure 2, where the structure of each residual module The schematic diagram is shown in Figure 3. The input feature map is processed by two 3*3 convolution blocks with edge padding. After extracting local features, features are integrated and screened through the pooling layer to reduce the dimension of image features. The output of each residual module is added to its original input to form a new data transmission path, which endows the residual network with the ability of identity mapping. In practical applications, the residual network model ResNet-18 and its pre-trained weights provided by the PyTorch framework can be used as the encoding network.
103、将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵;103. Fusing the initial feature vector and the initial position coordinates of each vertex to obtain a first feature matrix;
在获得初始特征向量之后,将该特征向量和该网格模板中的各个顶点的初始位置坐标融合,得到第一特征矩阵,该第一特征矩阵包含各个所述顶点分别对应的目标特征向量。某个顶点的初始位置坐标(x,y,z)可以视作3个元素的向量,故可以将该3个元素的向量和初始特征向量以拼接的方式融合,得到一个新向量,即目标特征向量。各个不同顶点分别对应的各个目标特征向量可以构成一个矩阵,也即该第一特征矩阵。After the initial feature vector is obtained, the feature vector is fused with the initial position coordinates of each vertex in the grid template to obtain a first feature matrix, and the first feature matrix includes target feature vectors corresponding to each of the vertices. The initial position coordinates (x, y, z) of a vertex can be regarded as a vector of 3 elements, so the vector of 3 elements and the initial feature vector can be fused in a splicing manner to obtain a new vector, that is, the target feature vector. Each target feature vector corresponding to each different vertex may form a matrix, that is, the first feature matrix.
在本申请的一个实施例中,所述将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,可以包括:In one embodiment of the present application, the merging of the initial feature vector and the initial position coordinates of each vertex to obtain the first feature matrix may include:
(1)将所述各个顶点的初始位置坐标表示为维度N*3的矩阵,N为所述各个顶点的数量;(1) the initial position coordinates of each vertex are represented as a matrix of dimension N*3, and N is the quantity of each vertex;
(2)将所述初始特征向量和所述维度N*3的矩阵在第二维度上拼接,得到维度N*(3+X)的所述第一特征矩阵,X为所述初始特征向量的元素数量。(2) Splicing the initial feature vector and the matrix of dimension N*3 on the second dimension to obtain the first feature matrix of dimension N*(3+X), where X is the matrix of the initial feature vector number of elements.
假设总共有N个顶点,每个顶点的初始位置坐标表示为3个元素的向量,则N个顶点的初始位置坐标可以表示为N*3的矩阵,再假设该初始特征向量的元素数量为X,则在第二维度拼接后会得到一个N*(X+3)的矩阵,作为第一特征矩阵。Suppose there are a total of N vertices, and the initial position coordinates of each vertex are expressed as a vector of 3 elements, then the initial position coordinates of N vertices can be expressed as a matrix of N*3, and the number of elements of the initial feature vector is assumed to be X , then an N*(X+3) matrix will be obtained after splicing in the second dimension as the first feature matrix.
在本申请的一个实施例中,在将所述初始特征向量和所述各个顶点的初始位置坐标融合之前,还可以包括:In one embodiment of the present application, before the fusion of the initial feature vector and the initial position coordinates of the respective vertices, it may further include:
(1)获取与所述目标物体对应的类别向量,所述类别向量用于表示所述目标物体所属的物体类别;(1) Acquiring a category vector corresponding to the target object, where the category vector is used to represent the object category to which the target object belongs;
(2)将所述类别向量和所述初始特征向量拼接,得到拼接后的特征向量。(2) Concatenate the category vector and the initial feature vector to obtain a concatenated feature vector.
所述将所述初始特征向量和所述各个顶点的初始位置坐标融合,具体可以为:The fusion of the initial feature vector and the initial position coordinates of each vertex may specifically be:
将所述拼接后的特征向量和所述各个顶点的初始位置坐标融合。The spliced feature vectors are fused with the initial position coordinates of each vertex.
为了提高本申请的泛用性,使其能够兼容多个不同类别物体的三维模型重建处理,在将 该初始特征向量和各个顶点的初始位置坐标融合之前,还可以先将某个类别向量与该初始特征向量拼接,然后再将拼接得到的向量和该初始位置坐标融合。In order to improve the versatility of this application and make it compatible with the three-dimensional model reconstruction processing of multiple different types of objects, before the initial feature vector and the initial position coordinates of each vertex are fused, a certain type vector can also be combined with the The initial feature vectors are concatenated, and then the concatenated vectors are fused with the initial position coordinates.
具体的,每个物体类别都对应唯一的一个类别向量,故类别向量可以采用独热编码的形式。例如,若待处理的数据集中总共有4类物体的图像,分别为桌子、椅子、电脑和飞机,则可以预先设置桌子对应的类别向量为(0,0,0,1),椅子对应的类别向量为(0,0,1,0),电脑对应的类别向量为(0,1,0,0),飞机对应的类别向量为(1,0,0,0),若当前处理的原始图像中的目标物体为桌子,则此时获取桌子对应的类别向量(0,0,0,1),与该初始特征向量进行拼接。Specifically, each object category corresponds to a unique category vector, so the category vector can be in the form of one-hot encoding. For example, if there are images of 4 types of objects in the data set to be processed, which are tables, chairs, computers, and airplanes, you can pre-set the category vector corresponding to the table as (0, 0, 0, 1), and the category corresponding to the chair The vector is (0, 0, 1, 0), the category vector corresponding to the computer is (0, 1, 0, 0), and the category vector corresponding to the aircraft is (1, 0, 0, 0). If the current processed original image The target object in is a table, then the category vector (0, 0, 0, 1) corresponding to the table is obtained at this time, and spliced with the initial feature vector.
举例说明具体的拼接方式如下:假设总共有2562个顶点,每个顶点的初始位置坐标表示为3个元素的向量,则2562个顶点的初始位置坐标可以表示为2562*3的矩阵。该初始特征向量的元素数量为1024,类别向量的元素数量为9,则先将初始特征向量和类别向量拼接,得到元素数量为1033的新特征向量,然后再将该新特征向量与该2562*3的矩阵的第二维度拼接,得到一个2562*1036的矩阵,作为第一特征矩阵。该第一特征矩阵中的每个1*1036的向量,就是每个模型顶点分别对应的语义向量。An example to illustrate the specific splicing method is as follows: Assume that there are 2562 vertices in total, and the initial position coordinates of each vertex are expressed as a vector of 3 elements, then the initial position coordinates of 2562 vertices can be expressed as a 2562*3 matrix. The number of elements of the initial feature vector is 1024, and the number of elements of the category vector is 9. First, the initial feature vector and the category vector are spliced to obtain a new feature vector with the number of elements of 1033, and then the new feature vector is combined with the 2562* The second dimension of the matrix of 3 is concatenated to obtain a 2562*1036 matrix as the first feature matrix. Each 1*1036 vector in the first feature matrix is a semantic vector corresponding to each model vertex.
104、将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵;104. Input the first feature matrix into a pre-built decoding network for processing, and output a second feature matrix;
在获得第一特征矩阵之后,将其输入一个预先构建的解码网络进行处理,得到第二特征矩阵,该第二特征矩阵包含每个顶点分别对应的转换后的目标位置坐标。其中,该解码网络为包含全连接层和注意力机制层的神经网络,注意力机制层用于针对所述各个顶点中的每个顶点,分别根据各个所述顶点与该顶点之间的相关性融合各个所述顶点分别对应的目标特征向量,得到该顶点对应的融合后的目标特征向量,所述融合后的目标特征向量用于确定该顶点对应的目标位置坐标。普通的解码网络通常采用多层堆叠的全连接网络对网格模板的顶点坐标偏移进行预测,得到转换后的目标位置坐标。然而,这种方法只能在预测时对图像的全局信息与单个目标点的信息进行考量,没有考虑目标点的相关点,尤其是局部相邻点之间的相互影响,容易导致重建的三维模型的表面出现不自然的凸起或凹陷。针对个问题,本申请在解码网络中添加了注意力机制层,以捕获同一物体不同顶点之间的位置相互影响关系。After the first feature matrix is obtained, it is input into a pre-built decoding network for processing to obtain the second feature matrix, which contains the converted target position coordinates corresponding to each vertex. Wherein, the decoding network is a neural network including a fully connected layer and an attention mechanism layer, and the attention mechanism layer is used for each of the vertices according to the correlation between each of the vertices and the vertices. The target feature vectors corresponding to each of the vertices are fused to obtain a fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex. Ordinary decoding networks usually use a multi-layer stacked fully connected network to predict the offset of the vertex coordinates of the grid template, and obtain the converted target position coordinates. However, this method can only consider the global information of the image and the information of a single target point when predicting, and does not consider the related points of the target point, especially the mutual influence between local adjacent points, which will easily lead to the reconstruction of the 3D model Unnatural bumps or depressions appear on the surface. To address this problem, this application adds an attention mechanism layer to the decoding network to capture the positional interaction between different vertices of the same object.
在本申请的一个实施例中,所述解码网络包含多个级联的解码模块,每个所述解码模块依次包含全连接层、注意力机制层和批归一化层,所述将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵,可以包括:In one embodiment of the present application, the decoding network includes a plurality of cascaded decoding modules, each of which includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in turn, and the The first feature matrix is input to the pre-built decoding network for processing, and the second feature matrix is output, which may include:
(1)将所述第一特征矩阵输入所述解码网络的第一个解码模块的全连接层进行处理,输出第一中间矩阵;(1) input the fully connected layer of the first decoding module of the decoding network with the first feature matrix for processing, and output the first intermediate matrix;
(2)将所述第一中间矩阵输入所述第一个解码模块的注意力机制层进行处理,输出第二中间矩阵;(2) the first intermediate matrix is input to the attention mechanism layer of the first decoding module for processing, and the second intermediate matrix is output;
(3)将所述第二中间矩阵和所述第一中间矩阵拼接,得到第三中间矩阵;(3) splicing the second intermediate matrix and the first intermediate matrix to obtain a third intermediate matrix;
(4)将所述第三中间矩阵输入所述第一个解码模块的批归一化层进行处理,得到第四中间矩阵;(4) the batch normalization layer of described first decoding module input described 3rd intermediate matrix is processed, obtains the 4th intermediate matrix;
(5)将所述第四中间矩阵输入所述解码网络的第二个解码模块,继续采用与所述第一个解码模块相同的处理方式,直至获得由所述解码网络的最后一个解码模块输出的所述第二特征矩阵。(5) Input the fourth intermediate matrix into the second decoding module of the decoding network, continue to use the same processing method as the first decoding module, until the output of the last decoding module of the decoding network is obtained The second characteristic matrix of .
如图4所示,为本申请实施例提供的一种解码网络的结构示意图。该解码网络包括多个堆叠的解码模块,其中每个解码模块依次由全连接层、注意力机制层和批归一化层构成。全连接层可以采用1*1的卷积实现,对单个顶点的坐标偏移进行预测,之后通过注意力机制层筛选、提取与当前顶点最相关的若干个顶点(一般是局部的相邻顶点)的坐标信息,与原输出进行拼接,再经由批归一化层(即Batch Normalization层,也称作批规约层)处理,使数据符合高斯分布,然后投入后续的网络中。As shown in FIG. 4 , it is a schematic structural diagram of a decoding network provided by an embodiment of the present application. The decoding network includes multiple stacked decoding modules, where each decoding module is sequentially composed of a fully connected layer, an attention mechanism layer and a batch normalization layer. The fully connected layer can be realized by 1*1 convolution to predict the coordinate offset of a single vertex, and then filter and extract several vertices most relevant to the current vertex (usually local adjacent vertices) through the attention mechanism layer The coordinate information of the data is spliced with the original output, and then processed by the batch normalization layer (that is, the Batch Normalization layer, also known as the batch reduction layer), so that the data conforms to the Gaussian distribution, and then put into the subsequent network.
进一步的,所述第一中间矩阵包含各个所述顶点分别对应的目标特征向量,所述将所述第一中间矩阵输入所述第一个解码模块的注意力机制层进行处理,输出第二中间矩阵,可以包括:Further, the first intermediate matrix includes target feature vectors corresponding to each of the vertices, the first intermediate matrix is input to the attention mechanism layer of the first decoding module for processing, and the second intermediate matrix is output matrix, which can include:
针对所述各个顶点中的每个顶点,均根据可训练的权重矩阵分别计算得到各个所述顶点与该顶点之间的相关性权重,然后将各个所述顶点分别对应的目标特征向量按照各自对应的相关性权重执行加权求和处理,得到该顶点对应的融合后的目标特征向量,所述第二中间矩阵是由各个所述顶点分别对应的融合后的目标特征向量组成的矩阵。For each of the vertices, the correlation weights between each of the vertices and the vertices are calculated according to the trainable weight matrix, and then the target feature vectors corresponding to each of the vertices are respectively corresponding to The correlation weights are weighted and summed to obtain the fused target feature vector corresponding to the vertex, and the second intermediate matrix is a matrix composed of the fused target feature vectors corresponding to each of the vertices.
如图5所示,为本申请采用的注意力机制层的处理示意图。将第一特征矩阵输入第一个解码模块的全连接层处理后,得到第一中间矩阵I∈R N*C,其中,N表示顶点数量,C表示每个顶点对应的目标特征向量的元素个数。将第一中间矩阵I输入注意力机制层处理后,得到第二中间矩阵A∈R N*C,然后将两个矩阵在第二维度拼接,得到第三中间矩阵O∈R N*2C。接着,将第三中间矩阵O输入批归一化层进行处理,然后接入下一个解码模块执行相同方式的处理,以此类推,最终输出第二特征矩阵,这个过程可以称作点际注意力机制处理。 As shown in Figure 5, it is a schematic diagram of the processing of the attention mechanism layer adopted in this application. After inputting the first feature matrix into the fully connected layer of the first decoding module for processing, the first intermediate matrix I∈R N*C is obtained, where N represents the number of vertices, and C represents the number of elements of the target feature vector corresponding to each vertex number. After inputting the first intermediate matrix I into the attention mechanism layer for processing, the second intermediate matrix A∈R N*C is obtained, and then the two matrices are spliced in the second dimension to obtain the third intermediate matrix O∈R N*2C . Next, the third intermediate matrix O is input to the batch normalization layer for processing, and then connected to the next decoding module to perform the same processing, and so on, and finally output the second feature matrix. This process can be called point-to-point attention mechanism processing.
在将第一中间矩阵I输入注意力机制层后,具体的处理方式为:针对某个顶点P,均采用一个可训练的权重矩阵分别计算得到该N-1个顶点(除去顶点P)与顶点P之间的相关性权重,然后将该N-1个顶点分别对应的目标特征向量按照各自对应的相关性权重执行加权求 和处理,得到顶点P对应的融合后的目标特征向量,这个过程中特征向量的维度不变(维度还是C)。采用与顶点P相同的处理方式得到该N个顶点分别对应的融合后的目标特征向量后,就会得到N个融合后的目标特征向量,它们组成第二中间矩阵A∈R N*CAfter inputting the first intermediate matrix I into the attention mechanism layer, the specific processing method is: for a certain vertex P, a trainable weight matrix is used to calculate the N-1 vertices (excluding vertex P) and vertex The correlation weight between P, and then the target feature vectors corresponding to the N-1 vertices are weighted and summed according to their corresponding correlation weights, and the fused target feature vector corresponding to the vertex P is obtained. In this process, The dimensionality of the feature vector is unchanged (dimension is still C). After obtaining the fused target feature vectors corresponding to the N vertices in the same way as the vertex P, N fused target feature vectors will be obtained, which form the second intermediate matrix A∈R N*C .
在计算相关性权重时,可以采用以下的公式(1.1):When calculating the correlation weight, the following formula (1.1) can be used:
Figure PCTCN2021093783-appb-000002
Figure PCTCN2021093783-appb-000002
其中,e i,j表示该第N个顶点中的任意两个顶点i和j之间的相关性权重,p i表示顶点i对应的目标特征向量,p j表示顶点j对应的目标特征向量,W为一个可训练的权重矩阵,该权重矩阵的初始数值可以人工设置,之后该权重矩阵的数值在该解码网络的训练过程中不断迭代更新。假设p i和p j都为1*1036的向量,则权重矩阵W为1036*1036的矩阵,这样计算得到的相关性权重就会是一个数值,表示顶点i和j之间的相关性大小。 Among them, e i, j represent the correlation weight between any two vertices i and j in the Nth vertex, p i represents the target feature vector corresponding to vertex i, p j represents the target feature vector corresponding to vertex j, W is a trainable weight matrix, the initial value of the weight matrix can be manually set, and then the value of the weight matrix is iteratively updated during the training process of the decoding network. Assuming that both p i and p j are vectors of 1*1036, the weight matrix W is a matrix of 1036*1036, so that the calculated correlation weight will be a value, indicating the correlation between vertices i and j.
另外,还可以采用以下的公式(1.2)对获得的各个顶点对应的相关性权重进行处理,以保证针对某个顶点的各个相关性权重之和为1:In addition, the following formula (1.2) can also be used to process the obtained correlation weights corresponding to each vertex, so as to ensure that the sum of the respective correlation weights for a certain vertex is 1:
a i=softmax(e i)              (1.2) a i =softmax(e i ) (1.2)
其中,a i表示softmax规约后的e i,e i是e i,j按照第j维度拼接起来得到的一个向量,表示除顶点i外的所有其它顶点和顶点i之间的相关性权重。 Among them, a i represents e i after softmax reduction, and e i is a vector obtained by concatenating e i and j according to the jth dimension, which represents the correlation weight between all other vertices except vertex i and vertex i.
顶点i对应的融合后的目标特征向量可以用以下的公式(1.3)表示:The fused target feature vector corresponding to vertex i can be expressed by the following formula (1.3):
Figure PCTCN2021093783-appb-000003
Figure PCTCN2021093783-appb-000003
其中,A i表示顶点i对应的融合后的目标特征向量,a i,j表示顶点j与顶点i之间规约后的相关性权重。 Among them, A i represents the fused target feature vector corresponding to vertex i, and a i,j represents the reduced correlation weight between vertex j and vertex i.
假设第一特征矩阵为2562*1036的矩阵,在将该矩阵输入解码网络后,解码网络中各个堆叠的解码模块会逐步对该矩阵执行降维操作(通过全连接层实现),最终得到一个2562*3的结果矩阵,表示2562个顶点分别对应的转换后的三维位置坐标。Assuming that the first feature matrix is a matrix of 2562*1036, after inputting the matrix into the decoding network, each stacked decoding module in the decoding network will gradually perform a dimension reduction operation on the matrix (realized by a fully connected layer), and finally obtain a 2562 The result matrix of *3 represents the converted three-dimensional position coordinates corresponding to 2562 vertices respectively.
105、根据各个所述顶点分别对应的目标位置坐标以及所述各个顶点之间的连接关系数据,重建得到所述目标物体对应的目标三维模型。105. Reconstruct and obtain a target three-dimensional model corresponding to the target object according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
最后,根据各个顶点分别对应的目标位置坐标,可以确定重建的三维模型中各个顶点的位置,然后再结合网格模板中包含的各个顶点之间的连接关系数据,就可以构建得到一个新的三维模型,作为该目标物体对应的目标三维模型。Finally, according to the target position coordinates corresponding to each vertex, the position of each vertex in the reconstructed 3D model can be determined, and then combined with the connection relationship data between each vertex contained in the grid template, a new 3D model can be constructed. model, as the target three-dimensional model corresponding to the target object.
在本申请的一个实施例中,在重建得到目标物体对应的目标三维模型之后,还可以包 括:In one embodiment of the present application, after reconstruction obtains the target three-dimensional model corresponding to the target object, it may also include:
(1)根据所述目标三维模型的各个顶点的位置坐标计算得到所述目标三维模型具有的所有二面角的大小;(1) calculating the size of all dihedral angles that the target three-dimensional model has according to the position coordinates of each vertex of the target three-dimensional model;
(2)根据所述所有二面角的大小计算得到平滑损失;(2) calculating the smoothing loss according to the size of all dihedral angles;
(3)基于所述平滑损失对所述解码网络的参数进行优化更新。(3) Optimizing and updating parameters of the decoding network based on the smoothing loss.
在构建出目标三维模型后,由于其各个顶点坐标以及顶点之间的连接关系都是已知的,故可以方便地计算得到该目标三维模型的每个二面角的大小。然后,可以根据所有二面角的大小计算得到平滑损失,以该平滑损失作为目标函数,对该解码网络的参数进行优化更新。After the target three-dimensional model is constructed, the size of each dihedral angle of the target three-dimensional model can be easily calculated because the coordinates of each vertex and the connection relationship between the vertices are known. Then, the smoothing loss can be calculated according to the size of all dihedral angles, and the smoothing loss is used as the objective function to optimize and update the parameters of the decoding network.
进一步的,所述根据所述所有二面角的大小计算得到平滑损失,具体可以为:Further, the smoothing loss is calculated according to the sizes of all the dihedral angles, which may specifically be:
采用以下公式(1.4)计算得到所述平滑损失:The smoothing loss is calculated by the following formula (1.4):
Figure PCTCN2021093783-appb-000004
Figure PCTCN2021093783-appb-000004
其中,L smooth表示所述平滑损失,θ i,j表示所述目标三维模型具有的任意两个平面之间的二面角,F表示所述目标三维模型具有的所有平面。在采用网格模板拟合得到目标三维模型的过程中,顶点之间的连接关系是不变的,因而可以根据各个顶点坐标方便地计算出各个二面角,然后采用公式(1.4)计算平滑损失。 Wherein, L smooth represents the smoothing loss, θ i,j represents the dihedral angle between any two planes of the target 3D model, and F represents all planes of the target 3D model. In the process of using the grid template to fit the target 3D model, the connection relationship between the vertices is unchanged, so each dihedral angle can be conveniently calculated according to the coordinates of each vertex, and then the smoothing loss can be calculated by formula (1.4) .
由于室内场景中的人造物体的表面通常是光滑的,而针对单个顶点的坐标的三维模型重建,由于神经网络的泛化性等原因,在重建物体模型的表面时往往会有不小的噪音,导致物体表面凹凸不平。为了解决这个问题,本申请实施例引入平滑损失对神经网络进行训练,对物体表面的平整性进行约束,能够使得重建得到的三维模型的表面更平整光滑,提高模型重建的效果。Since the surface of the artificial object in the indoor scene is usually smooth, and the 3D model reconstruction of the coordinates of a single vertex, due to the generalization of the neural network and other reasons, there is often a lot of noise when reconstructing the surface of the object model. resulting in uneven surfaces. In order to solve this problem, the embodiment of the present application introduces a smoothing loss to train the neural network and constrains the smoothness of the surface of the object, which can make the surface of the reconstructed 3D model smoother and improve the effect of model reconstruction.
本申请实施例首先获取包含目标物体的原始图像和某个预设的网格模板,提取该原始图像的特征向量,然后将该特征向量和该网格模板中具有的各个顶点的位置坐标融合,得到特征矩阵。接着,采用解码网络对该特征矩阵进行处理,在解码时引入注意力机制以考虑物体各个顶点之间的位置相关性,得到解码后的各个顶点的目标位置坐标。最后,根据获得的各个顶点的目标位置坐标以及之前获取的各个顶点之间的连接关系数据,重建得到该目标物体对应的三维模型。上述过程根据物体各个顶点之间的位置坐标的相关性执行特征向量的融合,能够考量物体各个顶点之间的相互影响关系,从而避免重建得到的物体三维模型的表面出现不自然的凸起或凹陷现象,提高三维模型的重建效果。In the embodiment of the present application, the original image containing the target object and a preset grid template are first obtained, the feature vector of the original image is extracted, and then the feature vector is fused with the position coordinates of each vertex in the grid template, Get the feature matrix. Then, the feature matrix is processed by the decoding network, and the attention mechanism is introduced during decoding to consider the position correlation between each vertex of the object, and the target position coordinates of each vertex after decoding are obtained. Finally, according to the acquired target position coordinates of each vertex and the previously acquired connection relationship data between each vertex, the 3D model corresponding to the target object is reconstructed. The above process performs the fusion of feature vectors according to the correlation of the position coordinates between the vertices of the object, which can consider the mutual influence relationship between the vertices of the object, so as to avoid unnatural protrusions or depressions on the surface of the reconstructed 3D model of the object phenomenon, and improve the reconstruction effect of the 3D model.
如图6所示,为本申请实施例提供的物体模型的重建方法的一种操作示意图。首先, 获取一张目标物体的图片,采用编码网络对该图片进行处理,得到对应的特征向量;然后,将该特征向量与该目标物体对应的类别向量进行拼接,并且与网格模板中的顶点坐标进行拼接;接着,将拼接得到的特征矩阵输入解码网络,该解码网络由堆叠的解码模块组成,每个解码模块依次包含全连接层、注意力机制层和批归一化层,通过点际注意力机制的方式,获得每个顶点经转换后得到的目标位置坐标;最后,根据每个顶点各自对应的目标位置坐标以及各个顶点之间的连接关系数据,重建得到该目标物体对应的三维模型。另外,可以根据重建得到的三维模型中的各个二面角计算得到平滑损失,并根据平滑损失对该解码网络进行优化训练,以提高获得的三维模型的表面的平整性。As shown in FIG. 6 , it is a schematic diagram of the operation of the object model reconstruction method provided by the embodiment of the present application. First, obtain a picture of the target object, use the encoding network to process the picture, and obtain the corresponding feature vector; then, splice the feature vector with the category vector corresponding to the target object, and combine it with the vertices in the grid template The coordinates are spliced; then, the spliced feature matrix is input into the decoding network, which is composed of stacked decoding modules. Each decoding module includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in sequence. The attention mechanism is used to obtain the target position coordinates of each vertex after conversion; finally, according to the target position coordinates corresponding to each vertex and the connection relationship data between each vertex, the 3D model corresponding to the target object is reconstructed . In addition, the smoothing loss can be calculated according to each dihedral angle in the reconstructed 3D model, and the decoding network can be optimized and trained according to the smoothing loss, so as to improve the smoothness of the surface of the obtained 3D model.
如图7所示,为本申请提出的物体模型的重建方法的处理效果示意图。其中,图7上方的5个三维模型为未采用点际注意力机制获得的重建的三维模型,图7下方的5个三维模型为对应的采用点际注意力机制获得的重建的三维模型。可以看出,图7上方的5个三维模型中存在很多不自然的凸起和凹陷(见图中的虚线框选处),而在图7下方的5个三维模型中则不存在这些凸起和凹陷,三维模型的重建效果更好。As shown in FIG. 7 , it is a schematic diagram of the processing effect of the object model reconstruction method proposed in this application. Among them, the five 3D models at the top of Figure 7 are reconstructed three-dimensional models not obtained by using the inter-point attention mechanism, and the five three-dimensional models at the bottom of Figure 7 are corresponding reconstructed three-dimensional models obtained by using the inter-point attention mechanism. It can be seen that there are many unnatural protrusions and depressions in the five 3D models at the top of Figure 7 (see the dotted line box in the figure), but these protrusions do not exist in the five 3D models at the bottom of Figure 7 and depressions, the reconstruction of the 3D model is better.
为验证本申请的三维模型重建效果,现采用与现有技术中Total3D原模型同样的数据集进行了三维模型重建试验。模型的输入为一个2562个顶点的球形网格模板以及一幅224*224的输入图片,以下的表1展示了本申请的运算模型与现有技术中的Total3D原模型以及AtlasNet模型在Pix3D数据集上,共9个类别的实际场景室内物体上的三维模型重建精度对比。其中,斜切角距离反应的是重建物体模型顶点与真值之间的位置偏差,法向量距离反应的是重建物体表面与真值表面的法向量偏差。根据表1所示的三维模型重建指标对比可以获知,本申请提出的运算模型相较于现有技术中的Total3D原模型以及AtlasNet模型,能够获得更小的位置偏差以及法向量偏差,即有效提高了三维模型的重建效果。In order to verify the 3D model reconstruction effect of this application, a 3D model reconstruction test is now carried out using the same data set as the original Total3D model in the prior art. The input of the model is a spherical grid template with 2562 vertices and a 224*224 input picture. Table 1 below shows the calculation model of this application and the original Total3D model and AtlasNet model in the prior art in the Pix3D data set In the above, a total of 9 categories of 3D model reconstruction accuracy comparisons on real scene indoor objects. Among them, the bevel angle distance reflects the position deviation between the vertices of the reconstructed object model and the true value, and the normal vector distance reflects the normal vector deviation between the reconstructed object surface and the true value surface. According to the comparison of the three-dimensional model reconstruction indicators shown in Table 1, it can be known that the calculation model proposed by this application can obtain smaller position deviation and normal vector deviation compared with the original Total3D model and AtlasNet model in the prior art, that is, it can effectively improve The reconstruction effect of the 3D model.
表1Table 1
Figure PCTCN2021093783-appb-000005
Figure PCTCN2021093783-appb-000005
Figure PCTCN2021093783-appb-000006
Figure PCTCN2021093783-appb-000006
图8是本申请与现有技术中的Total3D原模型获得的三维模型重建结果的对比图,其中左侧的一列是输入图片,中间的一列视采用Total3D原模型获得的三维模型重建结果,右侧的一列是采用本申请获得的三维模型重建结果。由此可见,采用本申请提出的运算模型能够生成更准确、平整的三维物体模型。Fig. 8 is a comparison diagram of the 3D model reconstruction results obtained by the original Total3D model in the present application and the prior art, wherein the left column is the input image, the middle column is the 3D model reconstruction result obtained by using the Total3D original model, and the right side A column of is the reconstruction result of the 3D model obtained by this application. It can be seen that a more accurate and smooth three-dimensional object model can be generated by using the calculation model proposed in this application.
应理解,上述各个实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application .
上面主要描述了一种物体模型的重建方法,下面将对一种物体模型的重建装置进行描述。The above mainly describes a method for reconstructing an object model, and a device for reconstructing an object model will be described below.
请参阅图9,本申请实施例中一种物体模型的重建装置的一个实施例包括:Please refer to FIG. 9, an embodiment of an object model reconstruction device in the embodiment of the present application includes:
数据获取模块801,用于获取预设的网格模板以及包含目标物体的原始图像,所述网格模板包含原始三维模型的各个顶点的初始位置坐标以及所述各个顶点之间的连接关系数据;A data acquisition module 801, configured to acquire a preset grid template and an original image containing the target object, the grid template including the initial position coordinates of each vertex of the original 3D model and the connection relationship data between the various vertices;
特征编码模块802,用于将所述原始图像输入预先构建的编码网络进行处理,输出所述原始图像对应的初始特征向量,所述编码网络为用于提取图像特征的神经网络;A feature encoding module 802, configured to input the original image into a pre-built encoding network for processing, and output an initial feature vector corresponding to the original image, and the encoding network is a neural network for extracting image features;
向量融合模块803,用于将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,所述第一特征矩阵包含各个所述顶点分别对应的目标特征向量;A vector fusion module 803, configured to fuse the initial feature vectors and the initial position coordinates of the respective vertices to obtain a first feature matrix, the first feature matrix including target feature vectors corresponding to each of the vertices;
特征解码模块804,用于将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵,所述第二特征矩阵包含各个所述顶点分别对应的目标位置坐标,所述解码网络为包含全连接层和注意力机制层的神经网络,所述注意力机制层用于针对所述各个顶点中的每个顶点,分别根据各个所述顶点与该顶点之间的相关性融合各个所述顶点分别对应的目标特征向量,得到该顶点对应的融合后的目标特征向量,所述融合后的目标特征向量用于确定该顶点对应的目标位置坐标;A feature decoding module 804, configured to input the first feature matrix into a pre-built decoding network for processing, and output a second feature matrix, the second feature matrix includes target position coordinates corresponding to each of the vertices, and the decoding The network is a neural network including a fully connected layer and an attention mechanism layer, and the attention mechanism layer is used to fuse each vertex according to the correlation between each vertex and the vertex for each vertex in the vertex. The target feature vectors corresponding to the vertices respectively are obtained to obtain the fused target feature vectors corresponding to the vertexes, and the fused target feature vectors are used to determine the target position coordinates corresponding to the vertexes;
模型重建模块805,用于根据各个所述顶点分别对应的目标位置坐标以及所述各个顶点之间的连接关系数据,重建得到所述目标物体对应的目标三维模型。The model reconstruction module 805 is configured to reconstruct the target three-dimensional model corresponding to the target object according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
在本申请的一个实施例中,所述物体模型的重建装置还可以包括:In one embodiment of the present application, the reconstruction device of the object model may also include:
类别向量获取模块,用于获取与所述目标物体对应的类别向量,所述类别向量用于表 示所述目标物体所属的物体类别;A category vector acquisition module, configured to acquire a category vector corresponding to the target object, where the category vector is used to represent the object category to which the target object belongs;
向量拼接模块,用于将所述类别向量和所述初始特征向量拼接,得到拼接后的特征向量;A vector splicing module, configured to splice the category vector and the initial feature vector to obtain a spliced feature vector;
所述向量融合模块具体可以用于:The vector fusion module can specifically be used for:
将所述拼接后的特征向量和所述各个顶点的初始位置坐标融合。The spliced feature vectors are fused with the initial position coordinates of each vertex.
在本申请的一个实施例中,所述向量融合模块可以包括:In one embodiment of the present application, the vector fusion module may include:
矩阵表示单元,用于将所述各个顶点的初始位置坐标表示为维度N*3的矩阵,N为所述各个顶点的数量;A matrix representation unit, configured to represent the initial position coordinates of each vertex as a matrix of dimension N*3, where N is the number of each vertex;
向量拼接单元,用于将所述初始特征向量和所述维度N*3的矩阵在第二维度上拼接,得到维度N*(3+X)的所述第一特征矩阵,X为所述初始特征向量的元素数量。A vector splicing unit, configured to splice the initial feature vector and the matrix of dimension N*3 on the second dimension to obtain the first feature matrix of dimension N*(3+X), where X is the initial The number of elements in the eigenvector.
在本申请的一个实施例中,所述解码网络包含多个级联的解码模块,每个所述解码模块依次包含全连接层、注意力机制层和批归一化层,所述特征解码模块可以包括:In one embodiment of the present application, the decoding network includes a plurality of cascaded decoding modules, each of which includes a fully connected layer, an attention mechanism layer, and a batch normalization layer in turn, and the feature decoding module Can include:
第一处理单元,用于将所述第一特征矩阵输入所述解码网络的第一个解码模块的全连接层进行处理,输出第一中间矩阵;A first processing unit, configured to input the first feature matrix into the fully connected layer of the first decoding module of the decoding network for processing, and output a first intermediate matrix;
第二处理单元,用于将所述第一中间矩阵输入所述第一个解码模块的注意力机制层进行处理,输出第二中间矩阵;The second processing unit is configured to input the first intermediate matrix into the attention mechanism layer of the first decoding module for processing, and output a second intermediate matrix;
第三处理单元,用于将所述第二中间矩阵和所述第一中间矩阵拼接,得到第三中间矩阵;A third processing unit, configured to splice the second intermediate matrix and the first intermediate matrix to obtain a third intermediate matrix;
第四处理单元,用于将所述第三中间矩阵输入所述第一个解码模块的批归一化层进行处理,得到第四中间矩阵;A fourth processing unit, configured to input the third intermediate matrix into the batch normalization layer of the first decoding module for processing to obtain a fourth intermediate matrix;
第五处理单元,用于将所述第四中间矩阵输入所述解码网络的第二个解码模块,继续采用与所述第一个解码模块相同的处理方式,直至获得由所述解码网络的最后一个解码模块输出的所述第二特征矩阵。The fifth processing unit is configured to input the fourth intermediate matrix into the second decoding module of the decoding network, and continue to use the same processing method as that of the first decoding module until the final result obtained by the decoding network is obtained. The second feature matrix output by a decoding module.
进一步的,所述第一中间矩阵包含各个所述顶点分别对应的目标特征向量,所述第二处理单元具体可以用于:Further, the first intermediate matrix includes target feature vectors corresponding to each of the vertices, and the second processing unit may specifically be used for:
针对所述各个顶点中的每个顶点,均根据可训练的权重矩阵分别计算得到各个所述顶点与该顶点之间的相关性权重,然后将各个所述顶点分别对应的目标特征向量按照各自对应的相关性权重执行加权求和处理,得到该顶点对应的融合后的目标特征向量,所述第二中间矩阵是由各个所述顶点分别对应的融合后的目标特征向量组成的矩阵。For each of the vertices, the correlation weights between each of the vertices and the vertices are calculated according to the trainable weight matrix, and then the target feature vectors corresponding to each of the vertices are respectively corresponding to The correlation weights are weighted and summed to obtain the fused target feature vector corresponding to the vertex, and the second intermediate matrix is a matrix composed of the fused target feature vectors corresponding to each of the vertices.
在本申请的一个实施例中,所述物体模型的重建装置还可以包括:In one embodiment of the present application, the reconstruction device of the object model may also include:
二面角计算模块,用于根据所述目标三维模型的各个顶点的位置坐标计算得到所述目标三维模型具有的所有二面角的大小;A dihedral angle calculation module, configured to calculate the size of all dihedral angles of the target three-dimensional model according to the position coordinates of each vertex of the target three-dimensional model;
平滑损失计算模块,用于根据所述所有二面角的大小计算得到平滑损失;A smoothing loss calculation module, configured to calculate the smoothing loss according to the size of all dihedral angles;
网络参数优化模块,用于基于所述平滑损失对所述解码网络的参数进行优化更新。A network parameter optimization module, configured to optimize and update parameters of the decoding network based on the smoothing loss.
进一步的,所述平滑损失计算模块具体用于:Further, the smoothing loss calculation module is specifically used for:
采用以下公式计算得到所述平滑损失:The smoothing loss is calculated by the following formula:
Figure PCTCN2021093783-appb-000007
Figure PCTCN2021093783-appb-000007
其中,L smooth表示所述平滑损失,θ i,j表示所述目标三维模型具有的任意两个平面i,j之间的二面角,F表示所述目标三维模型具有的所有平面。 Wherein, L smooth represents the smoothing loss, θ i, j represents the dihedral angle between any two planes i, j of the target 3D model, and F represents all planes of the target 3D model.
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如图1表示的任意一种物体模型的重建方法。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, any method for reconstructing an object model as shown in FIG. 1 is implemented.
本申请实施例还提供一种计算机程序产品,当该计算机程序产品在终端设备上运行时,使得终端设备执行实现如图1表示的任意一种物体模型的重建方法。The embodiment of the present application also provides a computer program product, which, when the computer program product is run on a terminal device, enables the terminal device to implement any method for reconstructing an object model as shown in FIG. 1 .
图10是本申请一实施例提供的终端设备的示意图。如图10所示,该实施例的终端设备9包括:处理器90、存储器91以及存储在所述存储器91中并可在所述处理器90上运行的计算机程序92。所述处理器90执行所述计算机程序92时实现上述各个物体模型的重建方法的实施例中的步骤,例如图1所示的步骤101至105。或者,所述处理器90执行所述计算机程序92时实现上述各装置实施例中各模块/单元的功能,例如图9所示模块801至805的功能。Fig. 10 is a schematic diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 10 , the terminal device 9 of this embodiment includes: a processor 90 , a memory 91 , and a computer program 92 stored in the memory 91 and operable on the processor 90 . When the processor 90 executes the computer program 92, it implements the steps in the embodiments of the reconstruction method of each object model mentioned above, such as steps 101 to 105 shown in FIG. 1 . Alternatively, when the processor 90 executes the computer program 92, it realizes the functions of the modules/units in the above-mentioned device embodiments, for example, the functions of the modules 801 to 805 shown in FIG. 9 .
所述计算机程序92可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器91中,并由所述处理器90执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序92在所述终端设备9中的执行过程。The computer program 92 can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 91 and executed by the processor 90 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 92 in the terminal device 9 .
所称处理器90可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 90 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
所述存储器91可以是所述终端设备9的内部存储单元,例如终端设备9的硬盘或内存。 所述存储器91也可以是所述终端设备9的外部存储设备,例如所述终端设备9上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器91还可以既包括所述终端设备9的内部存储单元也包括外部存储设备。所述存储器91用于存储所述计算机程序以及所述终端设备所需的其他程序和数据。所述存储器91还可以用于暂时地存储已经输出或者将要输出的数据。The storage 91 may be an internal storage unit of the terminal device 9 , such as a hard disk or memory of the terminal device 9 . The memory 91 can also be an external storage device of the terminal device 9, such as a plug-in hard disk equipped on the terminal device 9, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 91 may also include both an internal storage unit of the terminal device 9 and an external storage device. The memory 91 is used to store the computer program and other programs and data required by the terminal device. The memory 91 can also be used to temporarily store data that has been output or will be output.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Completion of modules means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the system embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元 上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括是电载波信号和电信信号。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments in the present application can also be completed by instructing related hardware through computer programs. The computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized. . Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excluding electrical carrier signals and telecommunication signals.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-described embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still implement the foregoing embodiments Modifications to the technical solutions described in the examples, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application, and should be included in the Within the protection scope of this application.

Claims (10)

  1. 一种物体模型的重建方法,其特征在于,包括:A method for reconstructing an object model, characterized in that it comprises:
    获取预设的网格模板以及包含目标物体的原始图像,所述网格模板包含原始三维模型的各个顶点的初始位置坐标以及所述各个顶点之间的连接关系数据;Obtaining a preset grid template and an original image containing the target object, the grid template including the initial position coordinates of each vertex of the original three-dimensional model and connection relationship data between the various vertices;
    将所述原始图像输入预先构建的编码网络进行处理,输出所述原始图像对应的初始特征向量,所述编码网络为用于提取图像特征的神经网络;Inputting the original image into a pre-built encoding network for processing, outputting an initial feature vector corresponding to the original image, the encoding network being a neural network for extracting image features;
    将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,所述第一特征矩阵包含各个所述顶点分别对应的目标特征向量;Fusing the initial feature vector and the initial position coordinates of each vertex to obtain a first feature matrix, the first feature matrix includes target feature vectors corresponding to each of the vertices;
    将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵,所述第二特征矩阵包含各个所述顶点分别对应的目标位置坐标,所述解码网络为包含全连接层和注意力机制层的神经网络,所述注意力机制层用于针对所述各个顶点中的每个顶点,分别根据各个所述顶点与该顶点之间的相关性融合各个所述顶点分别对应的目标特征向量,得到该顶点对应的融合后的目标特征向量,所述融合后的目标特征向量用于确定该顶点对应的目标位置坐标;The first feature matrix is input into a pre-built decoding network for processing, and the second feature matrix is output. The second feature matrix includes the target position coordinates corresponding to each of the vertices. The decoding network is composed of a fully connected layer and a The neural network of the attention mechanism layer, the attention mechanism layer is used to fuse the targets corresponding to each of the vertices respectively according to the correlation between each of the vertices and the vertices for each of the vertices A feature vector is obtained to obtain a fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex;
    根据各个所述顶点分别对应的目标位置坐标以及所述各个顶点之间的连接关系数据,重建得到所述目标物体对应的目标三维模型。A target three-dimensional model corresponding to the target object is reconstructed according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
  2. 如权利要求1所述的方法,其特征在于,在将所述初始特征向量和所述各个顶点的初始位置坐标融合之前,还包括:The method according to claim 1, further comprising:
    获取与所述目标物体对应的类别向量,所述类别向量用于表示所述目标物体所属的物体类别;Acquiring a category vector corresponding to the target object, where the category vector is used to represent the object category to which the target object belongs;
    将所述类别向量和所述初始特征向量拼接,得到拼接后的特征向量;splicing the category vector and the initial feature vector to obtain a spliced feature vector;
    所述将所述初始特征向量和所述各个顶点的初始位置坐标融合,具体为:The fusion of the initial feature vector and the initial position coordinates of each vertex is specifically:
    将所述拼接后的特征向量和所述各个顶点的初始位置坐标融合。The spliced feature vectors are fused with the initial position coordinates of each vertex.
  3. 如权利要求1所述的方法,其特征在于,所述将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,包括:The method according to claim 1, wherein said initial feature vector and said initial position coordinates of each vertex are fused to obtain a first feature matrix, comprising:
    将所述各个顶点的初始位置坐标表示为维度N*3的矩阵,N为所述各个顶点的数量;Expressing the initial position coordinates of each vertex as a matrix of dimension N*3, N being the quantity of each vertex;
    将所述初始特征向量和所述维度N*3的矩阵在第二维度上拼接,得到维度N*(3+X)的所述第一特征矩阵,X为所述初始特征向量的元素数量。Splicing the initial feature vector and the matrix of dimension N*3 in the second dimension to obtain the first feature matrix of dimension N*(3+X), where X is the number of elements of the initial feature vector.
  4. 如权利要求1所述的方法,其特征在于,所述解码网络包含多个级联的解码模块, 每个所述解码模块依次包含全连接层、注意力机制层和批归一化层,所述将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵,包括:The method according to claim 1, wherein the decoding network comprises a plurality of cascaded decoding modules, and each of the decoding modules comprises a fully connected layer, an attention mechanism layer and a batch normalization layer in turn, so The first feature matrix is input into a pre-built decoding network for processing, and the second feature matrix is output, including:
    将所述第一特征矩阵输入所述解码网络的第一个解码模块的全连接层进行处理,输出第一中间矩阵;Input the first feature matrix into the fully connected layer of the first decoding module of the decoding network for processing, and output the first intermediate matrix;
    将所述第一中间矩阵输入所述第一个解码模块的注意力机制层进行处理,输出第二中间矩阵;Input the first intermediate matrix into the attention mechanism layer of the first decoding module for processing, and output the second intermediate matrix;
    将所述第二中间矩阵和所述第一中间矩阵拼接,得到第三中间矩阵;splicing the second intermediate matrix and the first intermediate matrix to obtain a third intermediate matrix;
    将所述第三中间矩阵输入所述第一个解码模块的批归一化层进行处理,得到第四中间矩阵;Inputting the third intermediate matrix into the batch normalization layer of the first decoding module for processing to obtain a fourth intermediate matrix;
    将所述第四中间矩阵输入所述解码网络的第二个解码模块,继续采用与所述第一个解码模块相同的处理方式,直至获得由所述解码网络的最后一个解码模块输出的所述第二特征矩阵。Input the fourth intermediate matrix into the second decoding module of the decoding network, and continue to use the same processing method as that of the first decoding module until the output of the last decoding module of the decoding network is obtained. Second feature matrix.
  5. 如权利要求4所述的方法,其特征在于,所述第一中间矩阵包含各个所述顶点分别对应的目标特征向量,所述将所述第一中间矩阵输入所述第一个解码模块的注意力机制层进行处理,输出第二中间矩阵,包括:The method according to claim 4, wherein the first intermediate matrix includes target feature vectors corresponding to each of the vertices, and the first intermediate matrix is input to the attention of the first decoding module. The force mechanism layer processes and outputs the second intermediate matrix, including:
    针对所述各个顶点中的每个顶点,均根据可训练的权重矩阵分别计算得到各个所述顶点与该顶点之间的相关性权重,然后将各个所述顶点分别对应的目标特征向量按照各自对应的相关性权重执行加权求和处理,得到该顶点对应的融合后的目标特征向量,所述第二中间矩阵是由各个所述顶点分别对应的融合后的目标特征向量组成的矩阵。For each of the vertices, the correlation weights between each of the vertices and the vertices are calculated according to the trainable weight matrix, and then the target feature vectors corresponding to each of the vertices are respectively corresponding to The correlation weights are weighted and summed to obtain the fused target feature vector corresponding to the vertex, and the second intermediate matrix is a matrix composed of the fused target feature vectors corresponding to each of the vertices.
  6. 如权利要求1至5中任一项所述的方法,其特征在于,在重建得到所述目标物体对应的目标三维模型之后,还包括:The method according to any one of claims 1 to 5, characterized in that, after reconstructing the target three-dimensional model corresponding to the target object, further comprising:
    根据所述目标三维模型的各个顶点的位置坐标计算得到所述目标三维模型具有的所有二面角的大小;calculating the size of all dihedral angles of the target three-dimensional model according to the position coordinates of each vertex of the target three-dimensional model;
    根据所述所有二面角的大小计算得到平滑损失;Calculate the smoothing loss according to the size of all dihedral angles;
    基于所述平滑损失对所述解码网络的参数进行优化更新。Optimizing and updating parameters of the decoding network based on the smoothing loss.
  7. 如权利要求6所述的方法,其特征在于,所述根据所述所有二面角的大小计算得到平滑损失,具体为:The method according to claim 6, wherein the smoothing loss is calculated according to the size of all the dihedral angles, specifically:
    采用以下公式计算得到所述平滑损失:The smoothing loss is calculated by the following formula:
    Figure PCTCN2021093783-appb-100001
    Figure PCTCN2021093783-appb-100001
    其中,L smooth表示所述平滑损失,θ i,j表示所述目标三维模型具有的任意两个平面i,j之间的二面角,F表示所述目标三维模型具有的所有平面。 Wherein, L smooth represents the smoothing loss, θ i, j represents the dihedral angle between any two planes i, j of the target 3D model, and F represents all planes of the target 3D model.
  8. 一种物体模型的重建装置,其特征在于,包括:A device for reconstructing an object model, characterized in that it comprises:
    数据获取模块,用于获取预设的网格模板以及包含目标物体的原始图像,所述网格模板包含原始三维模型的各个顶点的初始位置坐标以及所述各个顶点之间的连接关系数据;A data acquisition module, configured to acquire a preset grid template and an original image containing the target object, the grid template including the initial position coordinates of each vertex of the original three-dimensional model and the connection relationship data between the various vertices;
    特征编码模块,用于将所述原始图像输入预先构建的编码网络进行处理,输出所述原始图像对应的初始特征向量,所述编码网络为用于提取图像特征的神经网络;A feature encoding module, configured to input the original image into a pre-built encoding network for processing, and output an initial feature vector corresponding to the original image, and the encoding network is a neural network for extracting image features;
    向量融合模块,用于将所述初始特征向量和所述各个顶点的初始位置坐标融合,得到第一特征矩阵,所述第一特征矩阵包含各个所述顶点分别对应的目标特征向量;A vector fusion module, configured to fuse the initial feature vector and the initial position coordinates of each vertex to obtain a first feature matrix, the first feature matrix includes target feature vectors corresponding to each of the vertices;
    特征解码模块,用于将所述第一特征矩阵输入预先构建的解码网络进行处理,输出第二特征矩阵,所述第二特征矩阵包含各个所述顶点分别对应的目标位置坐标,所述解码网络为包含全连接层和注意力机制层的神经网络,所述注意力机制层用于针对所述各个顶点中的每个顶点,分别根据各个所述顶点与该顶点之间的相关性融合各个所述顶点分别对应的目标特征向量,得到该顶点对应的融合后的目标特征向量,所述融合后的目标特征向量用于确定该顶点对应的目标位置坐标;A feature decoding module, configured to input the first feature matrix into a pre-built decoding network for processing, and output a second feature matrix, the second feature matrix includes target position coordinates corresponding to each of the vertices, and the decoding network It is a neural network comprising a fully connected layer and an attention mechanism layer, and the attention mechanism layer is used to fuse each of the vertices according to the correlation between each of the vertices and the vertices for each of the vertices. The target feature vector corresponding to the vertex respectively, obtains the fused target feature vector corresponding to the vertex, and the fused target feature vector is used to determine the target position coordinates corresponding to the vertex;
    模型重建模块,用于根据各个所述顶点分别对应的目标位置坐标以及所述各个顶点之间的连接关系数据,重建得到所述目标物体对应的目标三维模型。The model reconstruction module is used to reconstruct the target three-dimensional model corresponding to the target object according to the target position coordinates corresponding to each of the vertices and the connection relationship data between the vertices.
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的物体模型的重建方法。A terminal device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that, when the processor executes the computer program, the following claims 1 to 1 are implemented. The reconstruction method of the object model described in any one of 7.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的物体模型的重建方法。A computer-readable storage medium, the computer-readable storage medium stores a computer program, characterized in that, when the computer program is executed by a processor, the object model according to any one of claims 1 to 7 is realized rebuild method.
PCT/CN2021/093783 2021-05-14 2021-05-14 Method and apparatus for reconstructing object model, and terminal device and storage medium WO2022236802A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/093783 WO2022236802A1 (en) 2021-05-14 2021-05-14 Method and apparatus for reconstructing object model, and terminal device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/093783 WO2022236802A1 (en) 2021-05-14 2021-05-14 Method and apparatus for reconstructing object model, and terminal device and storage medium

Publications (1)

Publication Number Publication Date
WO2022236802A1 true WO2022236802A1 (en) 2022-11-17

Family

ID=84028704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/093783 WO2022236802A1 (en) 2021-05-14 2021-05-14 Method and apparatus for reconstructing object model, and terminal device and storage medium

Country Status (1)

Country Link
WO (1) WO2022236802A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863342A (en) * 2023-09-04 2023-10-10 江西啄木蜂科技有限公司 Large-scale remote sensing image-based pine wood nematode dead wood extraction method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458957A (en) * 2019-07-31 2019-11-15 浙江工业大学 A kind of three-dimensional image model construction method neural network based and device
CN110544297A (en) * 2019-08-06 2019-12-06 北京工业大学 Three-dimensional model reconstruction method for single image
CN110728219A (en) * 2019-09-29 2020-01-24 天津大学 3D face generation method based on multi-column multi-scale graph convolution neural network
US20210012558A1 (en) * 2018-08-28 2021-01-14 Tencent Technology (Shenzhen) Company Limited Method and apparatus for reconstructing three-dimensional model of human body, and storage medium
US20210042557A1 (en) * 2019-08-07 2021-02-11 Here Global B.V. Method, apparatus and computer program product for three dimensional feature extraction from a point cloud
CN112784782A (en) * 2021-01-28 2021-05-11 上海理工大学 Three-dimensional object identification method based on multi-view double-attention network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012558A1 (en) * 2018-08-28 2021-01-14 Tencent Technology (Shenzhen) Company Limited Method and apparatus for reconstructing three-dimensional model of human body, and storage medium
CN110458957A (en) * 2019-07-31 2019-11-15 浙江工业大学 A kind of three-dimensional image model construction method neural network based and device
CN110544297A (en) * 2019-08-06 2019-12-06 北京工业大学 Three-dimensional model reconstruction method for single image
US20210042557A1 (en) * 2019-08-07 2021-02-11 Here Global B.V. Method, apparatus and computer program product for three dimensional feature extraction from a point cloud
CN110728219A (en) * 2019-09-29 2020-01-24 天津大学 3D face generation method based on multi-column multi-scale graph convolution neural network
CN112784782A (en) * 2021-01-28 2021-05-11 上海理工大学 Three-dimensional object identification method based on multi-view double-attention network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863342A (en) * 2023-09-04 2023-10-10 江西啄木蜂科技有限公司 Large-scale remote sensing image-based pine wood nematode dead wood extraction method
CN116863342B (en) * 2023-09-04 2023-11-21 江西啄木蜂科技有限公司 Large-scale remote sensing image-based pine wood nematode dead wood extraction method

Similar Documents

Publication Publication Date Title
CN111369681B (en) Three-dimensional model reconstruction method, device, equipment and storage medium
WO2020199693A1 (en) Large-pose face recognition method and apparatus, and device
WO2020119527A1 (en) Human action recognition method and apparatus, and terminal device and storage medium
CN111325851B (en) Image processing method and device, electronic equipment and computer readable storage medium
JP2022524891A (en) Image processing methods and equipment, electronic devices and computer programs
US20220084163A1 (en) Target image generation method and apparatus, server, and storage medium
KR102612808B1 (en) lighting estimation
US11276218B2 (en) Method for skinning character model, device for skinning character model, storage medium and electronic device
WO2020143513A1 (en) Super-resolution image reconstruction method, apparatus and device
WO2023116231A1 (en) Image classification method and apparatus, computer device, and storage medium
JP7443647B2 (en) Keypoint detection and model training method, apparatus, device, storage medium, and computer program
CN111047509A (en) Image special effect processing method and device and terminal
JP2023029984A (en) Method, device, electronic apparatus, and readable storage medium for generating virtual image
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
WO2022213623A1 (en) Image generation method and apparatus, three-dimensional facial model generation method and apparatus, electronic device and storage medium
WO2022236802A1 (en) Method and apparatus for reconstructing object model, and terminal device and storage medium
CN110163095B (en) Loop detection method, loop detection device and terminal equipment
CN113298931B (en) Reconstruction method and device of object model, terminal equipment and storage medium
CN116385667A (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
US20220392251A1 (en) Method and apparatus for generating object model, electronic device and storage medium
US20220301348A1 (en) Face reconstruction using a mesh convolution network
DE102018129135A1 (en) USE OF REST VIDEO DATA RESULTING FROM A COMPRESSION OF ORIGINAL VIDEO DATA TO IMPROVE A DECOMPRESSION OF ORIGINAL VIDEO DATA
CN114821216A (en) Method for modeling and using picture descreening neural network model and related equipment
CN114820908B (en) Virtual image generation method and device, electronic equipment and storage medium
CN116740300B (en) Multi-mode-based prime body and texture fusion furniture model reconstruction method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941372

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE