CN111369681B

CN111369681B - Three-dimensional model reconstruction method, device, equipment and storage medium

Info

Publication number: CN111369681B
Application number: CN202010135051.0A
Authority: CN
Inventors: 向天戈
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2022-04-15
Anticipated expiration: 2040-03-02
Also published as: CN111369681A

Abstract

The application discloses a three-dimensional model reconstruction method, a three-dimensional model reconstruction device, three-dimensional model reconstruction equipment and a storage medium, and relates to the field of three-dimensional modeling. The method comprises the following steps: acquiring a single target image, wherein the target image comprises an image of a target reconstruction object; extracting the features of the target image to obtain a target feature map corresponding to the target image; inputting the target characteristic diagram into a grid reconstruction network, inputting the target characteristic diagram into a voxel reconstruction network, wherein the grid reconstruction network is in semantic connection with the voxel reconstruction network, the grid reconstruction network is used for reconstructing a three-dimensional grid model of a target reconstruction object according to the target characteristic diagram, and the voxel reconstruction network is used for reconstructing a voxel model of the target reconstruction object according to the target characteristic diagram; and constructing a target three-dimensional model according to target grid information output by the grid reconstruction network. In the embodiment of the application, the three-dimensional model reconstruction of the target reconstruction object can be realized only by a single image, the requirement on the two-dimensional image required by the three-dimensional reconstruction is reduced, and the flow of the three-dimensional reconstruction is simplified.

Description

Three-dimensional model reconstruction method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the field of three-dimensional modeling, in particular to a method, a device, equipment and a storage medium for reconstructing a three-dimensional model.

Background

Three-dimensional reconstruction is a technique for constructing a three-dimensional model based on two-dimensional images.

In the related art, when performing three-dimensional reconstruction based on two-dimensional images, it is necessary to acquire a plurality of two-dimensional images, which are obtained by shooting the same reconstruction object from different angles. The three-dimensional voxel model corresponding to the reconstruction object is constructed by encoding a plurality of two-dimensional images (namely, a feature extraction process) and then decoding the encoding result (namely, a process of reconstructing the three-dimensional voxel based on the image features).

However, when performing three-dimensional reconstruction in the above manner, it is necessary to capture a plurality of two-dimensional images for three-dimensional reconstruction first, the flow of three-dimensional reconstruction is complicated, and only a reconstruction object having an entity can be reconstructed, which results in a limited application scene of three-dimensional reconstruction.

Disclosure of Invention

The embodiment of the application provides a three-dimensional model reconstruction method, a three-dimensional model reconstruction device, three-dimensional model reconstruction equipment and a storage medium, and can realize the three-dimensional model reconstruction by using a single image, reduce the requirements on two-dimensional images during the three-dimensional model reconstruction and simplify the three-dimensional reconstruction process. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for reconstructing a three-dimensional model, where the method includes:

obtaining a single target image, wherein the target image comprises an image of a target reconstruction object;

performing feature extraction on the target image to obtain a target feature map corresponding to the target image;

inputting the target feature map into a grid reconstruction network, and inputting the target feature map into a voxel reconstruction network, wherein the grid reconstruction network and the voxel reconstruction network are connected semantically, the grid reconstruction network is used for reconstructing a three-dimensional grid model of the target reconstruction object according to the target feature map, and the voxel reconstruction network is used for reconstructing a voxel model of the target reconstruction object according to the target feature map;

and constructing a target three-dimensional model according to target grid information output by the grid reconstruction network, wherein the target grid information is information of grid vertexes in the target three-dimensional model.

In another aspect, an embodiment of the present application provides an apparatus for reconstructing a three-dimensional model, where the apparatus includes:

the image acquisition module is used for acquiring a single target image, wherein the target image comprises an image of a target reconstruction object;

the first extraction module is used for extracting the features of the target image to obtain a target feature map corresponding to the target image;

an input module, configured to input the target feature map into a mesh reconstruction network, and input the target feature map into a voxel reconstruction network, where the mesh reconstruction network and the voxel reconstruction network are connected semantically, the mesh reconstruction network is configured to reconstruct a three-dimensional mesh model of the target reconstruction object according to the target feature map, and the voxel reconstruction network is configured to reconstruct a voxel model of the target reconstruction object according to the target feature map;

and the reconstruction module is used for constructing a target three-dimensional model according to target grid information output by the grid reconstruction network, wherein the target grid information is information of grid vertexes in the target three-dimensional model.

In another aspect, embodiments of the present application provide a computer device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for reconstructing a three-dimensional model according to the above aspect.

In another aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method of reconstructing a three-dimensional model as described in the above aspect.

In another aspect, a computer program product is provided, which, when run on a computer, causes the computer to perform the method of reconstructing a three-dimensional model according to the above aspect.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

different from the prior art that three-dimensional reconstruction needs to be performed based on a plurality of two-dimensional images at different angles, in the embodiment of the application, the computer equipment only needs to acquire a single target image, namely, feature extraction can be performed on the target image, grid reconstruction is performed based on a feature map by using a grid reconstruction network, voxel reconstruction is performed based on the feature map by using a voxel reconstruction network, and then three-dimensional model reconstruction is completed according to target grid information output by the grid reconstruction network, so that the requirement on the two-dimensional image required by three-dimensional reconstruction is reduced, the flow of three-dimensional reconstruction is simplified, and the application scene of three-dimensional reconstruction (three-dimensional reconstruction can be performed on a reconstructed object without an entity) is expanded; in addition, the grid reconstruction network and the voxel reconstruction network are semantically connected, so that the networks can be subjected to feature fusion, the accuracy of finally output target grid information is improved, and the reconstruction quality of three-dimensional reconstruction based on a single image is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating a method for reconstructing a three-dimensional model according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation of the method for reconstructing a three-dimensional model in a game application;

FIG. 3 illustrates a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application;

FIG. 4 illustrates a flow chart of a method for reconstructing a three-dimensional model provided by an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an implementation process of a method for reconstructing a three-dimensional model according to an exemplary embodiment of the present application;

FIG. 6 illustrates a flow chart of a method for reconstructing a three-dimensional model provided by another exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of the manner in which semantic connections are made between a first decoder and a second decoder;

FIG. 8 is a flow chart of a network training process provided by an exemplary embodiment of the present application;

FIG. 9 is a schematic diagram illustrating an implementation of a network training process provided by an exemplary embodiment of the present application;

FIG. 10 is a flow chart of a network training process provided by another exemplary embodiment of the present application;

FIG. 11 is a block diagram of an apparatus for reconstructing a three-dimensional model according to an exemplary embodiment of the present application;

fig. 12 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Computer Vision (CV) is a science for researching how to make a machine "see", and further refers to using a camera and a Computer to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further performing image processing, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also includes common biometric technologies such as face Recognition and fingerprint Recognition.

The three-dimensional model reconstruction method provided by the embodiment of the application is the application of the computer vision technology in the scene of three-dimensional object reconstruction. With the three-dimensional model reconstruction method provided in the embodiment of the present application, as shown in fig. 1, when a reconstruction object (an airplane in fig. 1) in an image 11 needs to be reconstructed three-dimensionally, first, feature extraction is performed on the image 11 through a feature extraction network 12, so as to obtain a feature map 13. For the extracted feature map 13, three-dimensional mesh model reconstruction is performed based on the feature map 13 by using a mesh reconstruction network 14, and voxel model reconstruction is performed based on the feature map 13 by using a voxel reconstruction network 15. In order to improve the reconstruction quality of three-dimensional reconstruction based on a single image, the grid reconstruction network 14 and the voxel reconstruction network 15 are connected semantically (semantically fused), so that when the grid reconstruction network 14 is used for reconstructing a three-dimensional grid model, the voxel characteristics extracted from the voxel reconstruction network 15 can be fused, and the reconstruction quality of the grid reconstruction network 14 is further improved. Finally, the three-dimensional model 16 can be constructed according to the grid information output by the grid reconstruction network 14.

The application scenarios of the reconstruction method of the three-dimensional model provided by the embodiment of the application can include the following two scenarios:

adding new object in application program

The application program may be a virtual reality application program, a three-dimensional map application program, a military simulation program, a first person shooter game, or a Multiplayer Online Battle Arena game (MOBA). Taking an application program as a game as an example, as shown in fig. 2, when a user needs to add a custom virtual weapon in the game, a picture 22 containing a custom weapon image is selected and uploaded in a custom weapon interface 21, and a three-dimensional model 23 of the custom weapon is reconstructed by the application program according to the picture 22. When the user determines to use the customized weapon, the three-dimensional model 23 of the customized weapon is displayed in the game screen 24 during the game, and accordingly, the user can control the virtual object in the game to attack by using the customized weapon.

Two, 3D printing

When the method for reconstructing the three-dimensional model is applied to 3D printing, the method may be implemented as part or all of a 3D printing application. After the picture of the object to be printed is input into the 3D printing application program, the 3D printing application program performs three-dimensional model reconstruction according to the picture to obtain a grid model of the object to be printed. Further, the 3D printing application sends the model data of the mesh model to a 3D printing device, and the 3D printing device performs 3D printing according to the model data. Compared with the method that a series of complex parameters and abstract description of an object need to be input into the 3D printing device in the related technology, by adopting the method, 3D printing can be completed only by inputting a single picture, the 3D printing process is simplified, and the 3D printing realization difficulty is reduced.

Of course, the above description is only given by taking two typical application scenarios as examples, and the method provided in the embodiment of the present application may be applied to other scenarios that require a three-dimensional model reconstruction technology (for example, game application development, virtual reality technology, and the like), and the embodiment of the present application does not limit a specific application scenario.

The method for reconstructing the three-dimensional model provided by the embodiment of the application can be applied to computer equipment such as a terminal or a server. In a possible implementation manner, the method for reconstructing a three-dimensional model provided in the embodiment of the present application may be implemented as an application program or a part of an application program, and installed in a terminal, so that the terminal has a function of reconstructing a three-dimensional model from a single image; or the method can be applied to a background server of the application program, so that the server provides the three-dimensional model reconstruction service for the application program in the terminal.

Referring to fig. 3, a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application is shown. The implementation environment includes a terminal 310 and a server 320, where the terminal 310 and the server 320 perform data communication through a communication network, optionally, the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a metropolitan area network, and a wide area network.

The terminal 310 is installed with an application program required for reconstructing a three-dimensional model, where the application program may be a virtual reality application program, a three-dimensional map application program, a military simulation program, a first person shooter game, a 3D printing application program, or another Artificial Intelligence (AI) application program applied in the field of reconstructing a three-dimensional model, and the application is not limited in this embodiment.

Optionally, the terminal 310 may be a mobile terminal such as a tablet computer, a laptop portable notebook computer, or the like, or may be a terminal such as a desktop computer, a projection computer, or the like, which is not limited in this embodiment of the application.

The server 320 may be implemented as one server, or may be implemented as a server cluster formed by a group of servers, which may be physical servers or cloud servers. In one possible implementation, server 320 is a backend server for applications in terminal 310.

As shown in fig. 3, after the terminal 310 uploads a single image to the server 320 through the network, the server 320 first performs feature extraction on the image to obtain a feature map 321, then performs three-dimensional mesh model and voxel model reconstruction respectively through the mesh reconstruction network 322 and the voxel reconstruction network 323 which are connected semantically, and finally reconstructs the three-dimensional model 325 according to mesh information 324 output by the mesh reconstruction network 322, thereby feeding back the three-dimensional model 325 (which can feed back model data) to the terminal 310.

In other possible embodiments, the process of reconstructing the three-dimensional model may also be performed locally (i.e., by the terminal 310), without the aid of the server 320, which is not limited in this embodiment.

For convenience of description, the following embodiments are described as examples in which a method of reconstructing a three-dimensional model is executed by a computer device.

Referring to fig. 4, a flowchart of a method for reconstructing a three-dimensional model according to an exemplary embodiment of the present application is shown. The embodiment is described by taking the method as an example for a computer device, and the method comprises the following steps.

Step 401, a single target image is obtained, and the target image includes an image of a target reconstruction object.

The target image is a two-dimensional color image, which may be a gray image (each pixel includes a gray value) or a Red-Green-Blue (RGB) image (each pixel includes an RGB value), and the like.

Also, the target reconstruction object included in the target image may be a person, an animal, a building, a vehicle, a weapon, a static object, or the like, and the specific type of the target reconstruction object is not limited in this embodiment.

Alternatively, when the target image includes images of a plurality of objects, the computer device may determine the position of the target reconstruction object in the target image according to an artificial instruction, or automatically identify (for example, determine the object with the largest proportion in the image as the target reconstruction object, or determine the object of a specified type in the image as the target reconstruction object) the position of the target reconstruction object in the target image.

Optionally, the target image may be a shot image, a video screenshot image, a hand-drawn image, or the like, which is not limited in this embodiment.

And 402, extracting the features of the target image to obtain a target feature map corresponding to the target image.

In a possible implementation manner, a feature extraction network is preset in the computer device, and after the target image is preprocessed (for example, the image size is adjusted), the computer device performs feature extraction on the target image through the feature extraction network to obtain an output target feature map. The feature extraction network may be a convolutional neural network composed of a plurality of convolutional layers, which is not limited in this embodiment.

And 403, inputting the target feature map into a grid reconstruction network, inputting the target feature map into a voxel reconstruction network, wherein the grid reconstruction network and the voxel reconstruction network are connected semantically, the grid reconstruction network is used for reconstructing a three-dimensional grid model of the target reconstruction object according to the target feature map, and the voxel reconstruction network is used for reconstructing a voxel model of the target reconstruction object according to the target feature map.

Mesh (mesh) is a set of vertices and polygons representing a polyhedron shape in three-dimensional computer graphics, and is also called an unstructured network, and is generally composed of simple convex polygons such as triangles and quadrilaterals. The mesh reconstruction network in the embodiment of the application is used for outputting mesh information of mesh vertices in a mesh model according to an input feature map.

A voxel is an abbreviation of Volume element, which is the smallest unit of segmentation in three-dimensional space (similar to the smallest unit of segmentation Pixel in two-dimensional space). The voxel reconstruction model in the embodiment of the present application is used for outputting voxel information of voxels in the voxel model according to the input feature map. Compared with a grid model, the smoothness of the surface of the voxel model is low, and the accuracy of the model is lower than that of the grid model due to the lack of nodes and information among the nodes.

Compared with the method for reconstructing the three-dimensional model based on a plurality of images at different angles, the method for reconstructing the three-dimensional model based on a single image has less extracted features, so that in order to improve the reconstruction quality of the model, in the embodiment of the application, computer equipment processes the target feature map through the grid reconstruction network and the voxel reconstruction network respectively, and performs semantic connection on the grid reconstruction network and the voxel reconstruction network, so that voxel information output in the voxel reconstruction network can be fused in the process of reconstructing the grid vertex of the grid reconstruction network, and the accuracy of reconstructing the grid vertex is improved.

Optionally, after semantic connection is performed between the mesh reconstruction network and the voxel reconstruction network, voxel features in the voxel reconstruction network are fused with graph node features of mesh vertices in the mesh reconstruction network through mapping, and are used for mesh (vertex) reconstruction of the mesh reconstruction network, so that the mesh reconstruction accuracy is improved.

Of course, since the mesh model needs to be finally constructed in the embodiment of the present application, the features of the voxel reconstruction network need to be fused to the mesh reconstruction network, and in other possible embodiments, when the mesh model needs to be finally constructed, the features of the mesh reconstruction network may also be fused to the voxel reconstruction network, which is not limited in this embodiment.

And step 404, constructing a target three-dimensional model according to target grid information output by the grid reconstruction network, wherein the target grid information is information of grid vertexes in the target three-dimensional model.

In a possible implementation manner, the grid reconstruction network outputs target grid information based on the target feature map, the voxel reconstruction network outputs voxel information based on the target feature map, and when the three-dimensional model is reconstructed, the computer device constructs a target three-dimensional model (three-dimensional grid model) by using the target grid information.

Optionally, the target mesh information includes vertex coordinates (three-dimensional space coordinates) of mesh vertices, and may further include connection relationships between the mesh vertices.

Optionally, when the RGB image of the target image is obtained, the computer device may color the target three-dimensional model according to the RGB image, or add a model texture to the target three-dimensional model, which is not limited in this embodiment.

In summary, unlike the related art that three-dimensional reconstruction needs to be performed based on a plurality of two-dimensional images at different angles, in the embodiment of the present application, a computer device needs to acquire only a single target image, that is, perform feature extraction on the target image, perform grid reconstruction based on a feature map by using a grid reconstruction network, perform voxel reconstruction based on the feature map by using a voxel reconstruction network, and then complete three-dimensional model reconstruction according to target grid information output by the grid reconstruction network, thereby reducing the requirements on the two-dimensional images required for three-dimensional reconstruction, simplifying the flow of three-dimensional reconstruction, and expanding the application scenario of three-dimensional reconstruction (capable of performing three-dimensional reconstruction on an entity-free reconstructed object); in addition, the grid reconstruction network and the voxel reconstruction network are semantically connected, so that the networks can be subjected to feature fusion, the accuracy of finally output target grid information is improved, and the reconstruction quality of three-dimensional reconstruction based on a single image is improved.

In one possible embodiment, when the above method is applied to an application program supporting a virtual environment, after the target three-dimensional model is constructed, the computer device displays the target three-dimensional model in the virtual environment, wherein the target three-dimensional model may be at least one of a virtual prop (such as a virtual external weapon), a virtual vehicle (such as a virtual airplane, a virtual automobile) or a virtual object (such as a virtual character, a virtual animal and the like) in the virtual environment; when the above-described method is used for a stereoscopic printing application (such as a 3D printer), the computer device transmits model data of the target three-dimensional model to the stereoscopic printing device so that the stereoscopic printing device prints the solid model according to the model data.

In one possible implementation, as shown in fig. 5, the mesh reconstruction network is composed of a keypoint extraction subnetwork 504, an adjacency extraction subnetwork 506, and a first decoder 508, while the voxel reconstruction network is composed of a second decoder 509, and the first decoder 508 and the second decoder 509 are semantically connected. After the target image 501 is subjected to feature extraction by the encoder 502 to obtain a target feature map 503, a key point feature map 507 is extracted from the target feature map 503 by using the sub-network 506 for extracting key points, and an adjacency matrix 505 indicating the adjacency relationship of feature points is extracted by using the sub-network 504 for extracting adjacency, so that the adjacency matrix 505 and the key point feature map 507 are input to the first decoder 508, and the reconstruction of the mesh model is completed. Meanwhile, the target feature map 503 is input into the second decoder 509, and target mesh information is input according to the semantic connection result of the first decoder 508 and the second decoder 509, and a target three-dimensional model 510 is obtained through reconstruction. The following description will be made using exemplary embodiments.

Referring to fig. 6, a flow chart of a method for reconstructing a three-dimensional model according to another exemplary embodiment of the present application is shown. The embodiment is described by taking the method as an example for a computer device, and the method comprises the following steps.

Step 601, obtaining a single target image, wherein the target image includes an image of a target reconstruction object.

The step 401 may be referred to in the implementation manner of this step, and this embodiment is not described herein again.

Step 602, performing feature extraction on the target image to obtain a target feature map corresponding to the target image.

Illustratively, as shown in fig. 5, the computer device inputs an object image 501 into an encoder 502, performs feature extraction (gradually extracting shallow features and deep features) on the object image by a convolution layer in the encoder 502, and outputs an object feature map 503. Wherein, the target feature map is a multi-channel (channel) feature map.

After completing the feature map extraction, the computer device performs mesh reconstruction by the following steps 603 to 605, and performs voxel reconstruction by the following step 606. It should be noted that there is no strict sequence between steps 603 to 605 and step 606, and this embodiment is described by taking the synchronous execution of steps 603 to 605 and step 606 as an example, but the invention is not limited thereto.

Step 603, inputting the target feature map into the key point extraction sub-network to obtain a key point feature map output by the key point extraction sub-network, wherein the key point feature map comprises image features of key feature points in the target image.

The target feature map obtained through feature extraction comprises a large number of channels (each channel corresponds to one feature map), wherein the target feature map comprises feature maps corresponding to a small number of key feature points (feature points with high response to a convolution kernel in the feature extraction process) and feature maps corresponding to a large number of non-key feature points (feature points with low response to the convolution kernel in the feature extraction process). If the subsequent grid reconstruction is directly performed based on the target feature map, the operation amount is too large, and a large amount of invalid operations exist, so that the model reconstruction speed is influenced, and the operation resource waste is also caused.

In order to reduce the computation amount and improve the model reconstruction speed, optionally, the computer device extracts a sub-network through the key points, and extracts a key point feature map (or called as a key point map layer) corresponding to the key feature point from the target feature map, thereby achieving the effect of limiting the number of feature map channels. The number of the key point feature maps may be a preset number. For example, the number of the keypoint feature maps is 100.

In one possible implementation manner, the computer device trains a key point extraction sub-network (using a convolution network) in advance, and the input of the key point extraction sub-network is the target feature map, and the output of the key point extraction sub-network is the preset number of key point feature maps.

Illustratively, when the number of the keypoint feature maps is 100, the computer device trains the keypoint extraction sub-network whose output channels are 100, and during the application process, inputs a target feature map (for example, containing 1000 channels) into the keypoint extraction sub-network, and obtains the outputted keypoint feature maps of 100 channels.

In another possible implementation, since the response of the key feature point to the convolution kernel is higher than that of the non-key feature point in the feature extraction process, the computer device may select the top k feature maps with the largest sum of absolute values from the absolute values of the feature values in the feature maps corresponding to the channels, and determine the k feature maps as the feature maps of the key point.

Illustratively, when the target feature map is increased to 5 × 5 × 1000(1000 is the number of channels), the computer device calculates the sum of the absolute values of 25 feature values in each feature map, obtains the sum of 1000 absolute values, and screens the first 100 feature maps with the largest sum of absolute values as the feature map of the key point.

Of course, in addition to obtaining the feature map of the key point by using the above two methods, the computer device may also extract the feature map corresponding to the key feature point by using other possible methods, which is not limited in this embodiment.

After the number of channels of the feature map is limited, on the premise of keeping the feature map corresponding to the key feature point, the calculation amount of the feature map in the subsequent processing is reduced, the quality of subsequent model reconstruction is ensured, and the speed of the model reconstruction is increased.

And step 604, inputting the target feature map into the adjacency relation extraction sub-network to obtain an adjacency matrix output by the adjacency relation extraction sub-network, wherein the adjacency matrix is used for representing the adjacency relation of the feature points in the target reconstruction object.

In addition to determining the feature maps of the key points, the computer device needs to extract the adjacency relationship between the feature points in the target reconstruction object based on the target feature map, where the adjacency relationship is represented by an adjacency matrix.

An Adjacency Matrix (Adjacency Matrix) is a Matrix for representing an adjacent relationship between vertices, and elements in the Adjacency Matrix are symmetric with respect to a main diagonal. Wherein for an i x j adjacency matrix, for matrix element A_ijIf A is_ijThe value of (A) is 0, which means that the ith vertex and the jth vertex are not adjacent (i.e. there is no connecting line directly connecting the two vertices), if A is_ijIs 1, indicating that the ith vertex is not adjacent to the jth vertex, and A_ijAnd A_jiThe values of (a) are the same.

In the embodiment of the application, the computer device extracts the sub-network through the adjacency relation and extracts the adjacency relation between the feature points based on the target feature map. Wherein the computer device may obtain the adjacency matrix by collapsing a two-dimensional space.

In one possible embodiment, the computer device obtains the feature matrix A by linearly changing the target feature map through the adjacency-relation-extracting sub-network, so as to obtain the feature matrix A according to the feature matrix A and the result A thereof^TAnd calculating to obtain an adjacency matrix corresponding to the target characteristic diagram, wherein the adjacency matrix is A multiplied by A^T。

Of course, in addition to obtaining the adjacency matrix in the above manner, the computer device may also extract the adjacency relation between the feature points in other possible manners, which is not limited in this embodiment.

It should be noted that there is no strict precedence relationship between the above steps 603 and 604, and this embodiment is described by taking the synchronous execution of the steps 603 and 604 as an example, but the present invention is not limited to this configuration.

Step 605, the first decoder performs graph convolution processing on the keypoint feature graph and the adjacency matrix.

Optionally, the first decoder is composed of a plurality of graph convolution layers, each graph convolution layer is used for performing graph convolution processing on an input to output characteristics (i.e. graph node characteristics) of a reconstructed grid vertex, and the final output of the first decoder is grid information of the reconstructed grid vertex.

Since the adjacency matrix obtained in step 604 is used to characterize the adjacency relationship between the feature points, and the key point feature map corresponding to the key feature point is screened out in step 603, the method further includes the following steps when the first decoder performs the map convolution processing.

Firstly, determining key feature points according to feature values of all positions in a key point feature map.

Because the response of the key feature point to the convolution kernel is high, optionally, the computer device determines a position where a feature value (absolute value) in each key point feature map is maximum, and further determines a feature point (spatial position) corresponding to the position as a key feature point.

And secondly, determining a key point adjacency matrix according to the key feature points and the adjacency matrix, wherein the key point adjacency matrix is used for representing the adjacency relation between the key feature points.

Further, the computer device filters matrix elements corresponding to non-key feature points in the adjacency matrix according to the key feature points to obtain a key point adjacency matrix indicating the adjacency relation between the key feature points, wherein the key point adjacency matrix is a k × k matrix, and k is the number of the key feature points.

And thirdly, carrying out graph convolution processing on the key point feature graph and the key point adjacency matrix through a first decoder.

After the key point adjacency matrix is obtained through the steps, the computer device inputs the key point adjacency matrix (key feature point adjacency relation) and the key point feature map (key feature point feature) into the first decoder, and the first decoder performs the map convolution processing.

And 606, performing three-dimensional convolution processing on the target feature map through a second decoder.

Optionally, the second decoder is composed of a plurality of three-dimensional convolution layers (3D convolution layers), each of the three-dimensional convolution layers is used for performing three-dimensional convolution processing on the input to output the three-dimensional characteristics of the reconstructed voxel, and the final output of the second decoder is the voxel information of the reconstructed voxel.

In the embodiment of the present application, the first decoder and the second decoder are connected semantically, and the output characteristics of the three-dimensional convolution layer in the second decoder can be fused with the output characteristics of the graph convolution layer in the first decoder. Schematically, the semantic connection relationship between the first decoder and the second decoder is shown in fig. 7. In the decoding process of the computer device by the first decoder and the second decoder, the process of semantic fusion of the decoding results in the middle of the decoders is as described in the following steps 607 to 609.

Step 607, obtaining the three-dimensional characteristics output by the nth three-dimensional convolution layer in the second decoder, wherein the three-dimensional characteristics are used for representing the characteristics of the reconstructed voxel, and n is an integer greater than or equal to 1.

In one possible embodiment, the three-dimensional features output by the three-dimensional convolution layer are represented as a feature map of W (width) × H (height) × D (depth) × C (channel), and the nth three-dimensional convolution layer outputs the three-dimensional features after performing three-dimensional convolution processing on the input, and the three-dimensional features are acquired by the computer device for subsequent feature fusion in order to be able to fuse the features of voxel reconstruction when performing mesh reconstruction.

Step 608, obtaining graph node characteristics output by the nth graph convolution layer in the first decoder, wherein the graph node characteristics are used for characterizing the characteristics of the reconstructed mesh vertex.

Optionally, the graph convolution layer outputs vertex coordinates of vertices of the reconstructed mesh in addition to the graph node feature. Similar to the above steps, the computer device obtains graph node features output by the nth graph convolutional layer corresponding to the nth three-dimensional convolutional layer for subsequent feature fusion.

And step 609, performing feature fusion on the three-dimensional features and the graph node features, and inputting the fused features into the (n + 1) th graph convolutional layer.

And for the acquired three-dimensional features and the graph node features, the computer equipment performs feature fusion on the three-dimensional features and the graph node features, so that the fused features are input into the (n + 1) th graph convolution layer to perform graph convolution processing. Because the characteristic fusion and the voxel characteristic of the n + 1-th graph convolutional layer are input, namely the characteristic dimensionality of the characteristic is richer, the graph node characteristic output after the graph convolution processing is carried out on the n + 1-th graph convolutional layer is more accurate, and correspondingly, the vertex coordinate of the vertex of the reconstruction grid is more fit with the target reconstruction object.

Regarding the manner of feature fusion, in one possible embodiment, this step includes the following steps.

Firstly, obtaining the vertex coordinates of the vertex of the reconstruction grid corresponding to the graph node characteristics.

The process of fusing three-dimensional features to graph node features may be understood as the projection of features of a three-dimensional space into a two-dimensional space (node pooling process). Therefore, before feature fusion, the computer device needs to acquire vertex coordinates of the vertices of the reconstruction mesh corresponding to the feature of the graph node, so as to project the feature at the vertex coordinates in the three-dimensional space to the two-dimensional space.

Optionally, the vertex coordinates are output by the nth map convolutional layer, denoted as (x, y, z).

And secondly, performing linear interpolation operation on the three-dimensional characteristics on the peripheral side of the vertex coordinates, and performing characteristic fusion on the characteristics obtained by the linear interpolation operation and the characteristics of the graph nodes.

In a possible embodiment, after the vertex coordinates are obtained, the computer device obtains, according to the vertex coordinates, three-dimensional features corresponding to voxels on the periphery of the vertex coordinates (for example, voxels on the periphery of 8 voxels), and thereby performs a linear interpolation operation on the obtained three-dimensional features to obtain features to be fused.

Further, the computer device performs feature fusion (feature fusion may be performed by concat) on the features obtained by the operation and the graph node features, and inputs the next graph convolution layer.

Illustratively, as shown in fig. 7, after pooling the three-dimensional features output by the three-dimensional convolutional layer 71 with nodes, the computer device performs feature fusion on the pooled features with the graph node features output by the graph convolutional layer 74, and inputs the fused features into the graph convolutional layer 75; similarly, after the computer device performs node pooling on the three-dimensional features output by the three-dimensional convolutional layer 72, the node pooled features are feature fused with the map node features output by the map convolutional layer 75, and the fused features are input to the map convolutional layer 76.

Step 610, constructing a target three-dimensional model according to target mesh information output by the first decoder, wherein the target mesh information is information of mesh vertexes in the target three-dimensional model.

When the three-dimensional model is reconstructed, the computer device adopts the target grid information output by the first decoder to construct a target three-dimensional model.

In this embodiment, the computer device uses the key point extraction sub-network to limit the number of channels of the target feature map, to obtain the key feature map corresponding to the key feature point, and further performs graph convolution processing on the key feature map and the key point adjacency matrix, so as to implement grid reconstruction, reduce the computation amount during reconstruction and improve the speed of model reconstruction on the premise of ensuring the reconstruction quality.

In addition, in this embodiment, the computer device performs node pooling on the three-dimensional features output by the three-dimensional convolutional layer in the second decoder, and fuses with the graph node features output by the corresponding graph convolutional layer, so that the complementation between the voxel features and the grid features is realized, and the quality of model reconstruction is further improved.

Regarding the above-mentioned training method for the mesh reconstruction network and the voxel reconstruction network, in one possible embodiment, as shown in fig. 8, the training process includes the following steps.

Step 801, performing feature extraction on the sample image to obtain a sample feature map corresponding to the sample image, where the sample image includes an image of a sample reconstruction object.

The process of extracting the features of the sample image by the computer device may refer to the process of extracting the features of the target image in the application stage, which is not described herein again in this embodiment.

Step 802, inputting the sample feature map into a grid reconstruction network, and inputting the sample feature map into a voxel reconstruction network.

Similar to the application stage, in the training stage, the computer device inputs the sample feature map into a grid reconstruction network and a voxel reconstruction network (the networks are connected with each other in the same semantic meaning), the grid reconstruction network reconstructs a grid model of a sample reconstruction object, and the voxel reconstruction network reconstructs a voxel model of the sample reconstruction object.

Step 803, obtaining sample grid information output by the grid reconstruction network and sample voxel information output by the voxel reconstruction network.

Because the voxel characteristics in the voxel reconstruction network are merged into the grid reconstruction network, the accuracy of the voxel information output by the voxel reconstruction network also affects the accuracy of the grid information output by the grid reconstruction network, so that in the training stage, the computer equipment needs to synchronously train the two networks based on the sample grid information output by the grid reconstruction network and the sample voxel information output by the voxel reconstruction network.

Optionally, the sample network information includes vertex coordinates corresponding to each vertex of the mesh in the mesh reconstruction model, and the voxel information includes voxel coordinates corresponding to each voxel in the voxel reconstruction model.

And step 804, determining network loss according to the sample grid information, the sample voxel information and a standard three-dimensional model, wherein the standard three-dimensional model is a three-dimensional model corresponding to the sample reconstruction object.

In one possible embodiment, the computer device determines a mesh reconstruction loss from the sample mesh information and a standard three-dimensional model (corresponding standard mesh information) by a loss function (loss function), and determines a voxel reconstruction loss from the sample voxel information and the standard three-dimensional model (corresponding standard voxel information), thereby determining the mesh reconstruction loss and the voxel reconstruction loss as the mesh loss.

Optionally, the sample image may be an image obtained by performing two-dimensional image acquisition on the standard three-dimensional model; or, the standard three-dimensional model may be a three-dimensional model artificially constructed based on a sample reconstruction object in a sample image, and is used as supervision in a network training process.

Step 805, training a grid reconstruction network and a voxel reconstruction network according to the network loss.

Optionally, the computer device performs network training in an end-to-end manner, that is, network parameters in the mesh reconstruction network and the voxel reconstruction network are trained through a back propagation algorithm or a gradient descent algorithm according to the network loss, and the network training is completed until the network loss meets a convergence condition.

In a possible embodiment, as shown in fig. 9, a sample image 901 is encoded by an encoder 902 to obtain a sample feature map 903, a mesh reconstruction network 904 performs mesh model reconstruction based on the sample feature map 903 to output sample mesh information 905, and a voxel reconstruction network 906 performs voxel model reconstruction based on the sample feature map 903 to output sample voxel information 907. It can be seen that the quality of feature extraction of the encoder 902 is particularly important since both subsequent mesh and voxel reconstruction depend on the sample feature map 903. Therefore, the computer device needs to synchronously train the encoder while training the network.

In one possible implementation, as illustrated in fig. 10, the training process may include the following steps.

Step 1001, performing feature extraction on the sample image through an encoder to obtain a sample feature map corresponding to the sample image, where the sample image includes an image of a sample reconstruction object.

Optionally, the encoder is a convolutional neural network composed of a plurality of convolutional layers. The present embodiment does not limit the structure of the encoder.

Step 1002, inputting the sample feature map into a grid reconstruction network, and inputting the sample feature map into a voxel reconstruction network.

Step 1003, obtaining sample grid information output by the grid reconstruction network and sample voxel information output by the voxel reconstruction network.

The above-mentioned implementation of steps 1002 to 1003 can refer to steps 802 to 803, and this embodiment is not described herein again.

And step 1004, acquiring an intermediate feature map output by the encoder, wherein the feature extraction depth of the intermediate feature map is lower than that of the sample feature map.

In the process of carrying out feature extraction on the image by the encoder, the depth of feature extraction is continuously increased (more and more abstract), and finally, a feature map representing deep features of the image is output. In order to improve the quality of the subsequent reconstructed image, the computer device needs to perform image reconstruction based on the intermediate feature map output in the encoding process of the encoder in addition to the sample feature map (deep feature).

In one possible implementation, a computer device obtains an intermediate feature map of a given convolutional layer output in an encoder, thereby obtaining a feature map characterizing shallow features.

And 1005, performing jump connection on the intermediate feature map and the sample feature map, and inputting the feature map obtained by connection into a third decoder to obtain a reconstructed sample image output by the third decoder.

Optionally, the computer device connects the obtained sample feature map and the intermediate feature map in a jump connection manner. Wherein, the process of jumping connection between feature maps can refer to the process of jumping connection between features (connection between deep features and shallow features) at different levels in a U-shaped network (U-net).

In an illustrative example, the computer device obtains 5 intermediate feature maps, which are a first intermediate feature map, a second intermediate feature map, a third intermediate feature map, a fourth intermediate feature map, and a fifth intermediate feature map, respectively. When a jump connection is made, the computer device connects the first intermediate feature map with the sample feature map, connects the second intermediate feature map with the fifth intermediate feature map, and connects the third intermediate feature map with the fourth intermediate feature map.

Further, the computer device inputs the connected feature map into a third decoder, and the third decoder performs sample image reconstruction (or called restoration) based on the features. The third decoder comprises a plurality of deconvolution layers for deconvolution processing on the input features, so that the original image is restored.

Step 1006, determining a network loss according to the sample grid information, the sample voxel information, the standard three-dimensional model, the sample image, and the reconstructed sample image.

In one possible embodiment, the step may include the steps of:

firstly, determining the reconstruction loss of the grid according to the sample grid information and the standard three-dimensional model.

In a possible implementation, the computer device calculates the network reconstruction loss through the second norm loss function according to the sample network information and the standard mesh information corresponding to the standard three-dimensional model, and the calculation process can be expressed as the following formula:

L2＝l2(y,B)

wherein L2 is the grid reconstruction loss, L2 is the second norm loss, y is the standard three-dimensional model, and B is the sample grid information.

And secondly, determining the voxel reconstruction loss according to the sample voxel information and the standard three-dimensional model.

In a possible implementation, the computer device calculates the voxel reconstruction loss according to the sample voxel information and standard voxel information corresponding to a standard three-dimensional model by using a Mean Square Error (Mean Square Error) function, and the calculation process may be expressed as the following formula:

L3＝MSE(y,C)

wherein, L3 is voxel reconstruction loss, MSE is mean square error, y is standard three-dimensional model, and is sample voxel information.

And thirdly, determining the image reconstruction loss according to the reconstructed sample image and the sample image.

In a possible embodiment, the computer device uses the sample image as a monitor, and calculates an image reconstruction loss for reconstructing the sample image by using a MSE function, wherein the calculation process can be expressed as the following formula:

L1＝MSE(x,A)

where L1 is an image reconstruction loss, and x is a sample image, which is a reconstructed sample image.

And fourthly, determining the grid reconstruction loss, the voxel reconstruction loss and the image reconstruction loss as the network loss.

In one possible implementation, the computer device determines the network loss based on the mesh reconstruction loss, the voxel reconstruction loss, and the image reconstruction loss, and respective corresponding hyper-parameters (for controlling the loss proportion) of the reconstruction losses. Wherein, the network loss can be expressed as:

L＝L1+αL2+βL3

wherein alpha and beta are respectively the hyper-parameters corresponding to the grid reconstruction loss and the voxel reconstruction loss.

Step 1007, training the mesh reconstruction network, the voxel reconstruction network, the encoder and the third decoder according to the network loss.

Similar to the above step 805, the computer device performs network training in an end-to-end manner, that is, network parameters in the mesh reconstruction network, the voxel reconstruction network, the encoder, and the third decoder are trained through a back propagation algorithm or a gradient descent algorithm according to the network loss until the network loss meets the convergence condition, and the network training is completed.

In this embodiment, the computer device constructs a network loss from three aspects of a grid reconstruction loss, a voxel reconstruction loss, and an image reconstruction loss, and trains the encoder and the decoder based on the network loss, thereby improving the encoding and decoding quality of the encoder and the decoder, and improving the quality of subsequent model reconstruction.

Fig. 11 is a block diagram of a device for reconstructing a three-dimensional model according to an exemplary embodiment of the present application, where the device may be disposed in a computer apparatus according to the foregoing embodiment, and as shown in fig. 11, the device includes:

an image obtaining module 1101, configured to obtain a single target image, where the target image includes an image of a target reconstruction object;

a first extraction module 1102, configured to perform feature extraction on the target image to obtain a target feature map corresponding to the target image;

an input module 1103, configured to input the target feature map into a mesh reconstruction network, and input the target feature map into a voxel reconstruction network, where the mesh reconstruction network and the voxel reconstruction network are connected semantically, the mesh reconstruction network is configured to reconstruct a three-dimensional mesh model of the target reconstruction object according to the target feature map, and the voxel reconstruction network is configured to reconstruct a voxel model of the target reconstruction object according to the target feature map;

and the reconstructing module 1104 is configured to construct a target three-dimensional model according to target mesh information output by the mesh reconstructing network, where the target mesh information is information of mesh vertices in the target three-dimensional model.

Optionally, the mesh reconstruction network includes a key point extraction sub-network, an adjacency extraction sub-network, and a first decoder;

the input module 1103 includes:

a first input unit, configured to input the target feature map into the key point extraction sub-network, so as to obtain a key point feature map output by the key point extraction sub-network, where the key point feature map includes image features of key feature points in the target image;

a second input unit, configured to input the target feature map into the adjacency relation extraction sub-network, so as to obtain an adjacency matrix output by the adjacency relation extraction sub-network, where the adjacency matrix is used to represent an adjacency relation of feature points in the target reconstruction object;

and the graph convolution unit is used for carrying out graph convolution processing on the key point feature graph and the adjacency matrix through the first decoder.

Optionally, the graph convolution unit is configured to:

determining the key feature points according to the feature values of all positions in the key feature point map;

determining a key point adjacency matrix according to the key feature points and the adjacency matrix, wherein the key point adjacency matrix is used for representing the adjacency relation between the key feature points;

and performing graph convolution processing on the key point feature graph and the key point adjacency matrix through the first decoder.

Optionally, the voxel reconstruction network includes a second decoder, the second decoder is configured to perform three-dimensional convolution processing on the target feature map, and the first decoder and the second decoder are connected semantically;

the device further comprises:

a first feature obtaining module, configured to obtain a three-dimensional feature output by an nth three-dimensional convolutional layer in the second decoder, where the three-dimensional feature is used to characterize a reconstructed voxel, and n is an integer greater than or equal to 1;

a second feature obtaining module, configured to obtain a graph node feature output by an nth graph convolution layer in the first decoder, where the graph node feature is used to characterize a feature of a reconstructed mesh vertex;

and the feature fusion module is used for performing feature fusion on the three-dimensional features and the graph node features and inputting the fused features into the (n + 1) th graph convolutional layer.

Optionally, the feature fusion module includes:

the coordinate acquisition unit is used for acquiring vertex coordinates of the vertex of the reconstruction grid corresponding to the graph node characteristics;

and the fusion unit is used for performing linear interpolation operation on the three-dimensional characteristics on the periphery side of the vertex coordinates and performing characteristic fusion on the characteristics obtained by the linear interpolation operation and the graph node characteristics.

Optionally, the apparatus further comprises:

a display module for displaying the target three-dimensional model in a virtual environment, the target three-dimensional model being at least one of a virtual prop, a virtual vehicle, or a virtual object in the virtual environment;

alternatively, the first and second electrodes may be,

and the printing module is used for sending the model data of the target three-dimensional model to a three-dimensional printing device, and the three-dimensional printing device is used for printing the entity model according to the model data.

Optionally, the apparatus further comprises:

the second extraction module is used for extracting the characteristics of a sample image to obtain a sample characteristic diagram corresponding to the sample image, wherein the sample image comprises an image of a sample reconstruction object;

a sample feature input module, configured to input the sample feature map into the mesh reconstruction network, and input the sample feature map into the voxel reconstruction network;

the sample information acquisition module is used for acquiring sample grid information output by the grid reconstruction network and sample voxel information output by the voxel reconstruction network;

a loss calculation module, configured to determine a network loss according to the sample grid information, the sample voxel information, and a standard three-dimensional model, where the standard three-dimensional model is a three-dimensional model corresponding to the sample reconstruction object;

and the training module is used for training the grid reconstruction network and the voxel reconstruction network according to the network loss.

Optionally, the sample feature map is obtained by performing feature extraction on the sample image by an encoder;

the device further comprises:

the third feature acquisition module is used for acquiring an intermediate feature map output by the encoder, wherein the feature extraction depth of the intermediate feature map is lower than that of the sample feature map;

the image reconstruction module is used for carrying out jump connection on the intermediate feature map and the sample feature map and inputting the feature map obtained by connection into a third decoder to obtain a reconstructed sample image output by the third decoder;

the loss calculation module is configured to determine the network loss according to the sample grid information, the sample voxel information, the standard three-dimensional model, the sample image, and the reconstructed sample image;

the training module is configured to train the mesh reconstruction network, the voxel reconstruction network, the encoder, and the third decoder according to the network loss.

Optionally, the loss calculating module is further configured to:

determining grid reconstruction loss according to the sample grid information and the standard three-dimensional model;

determining voxel reconstruction loss according to the sample voxel information and the standard three-dimensional model;

determining an image reconstruction loss according to the reconstructed sample image and the sample image;

determining the mesh reconstruction loss, the voxel reconstruction loss, and the image reconstruction loss as the mesh loss.

It should be noted that: the three-dimensional model reconstruction device provided in the above embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the three-dimensional model reconstruction device provided in the above embodiments and the three-dimensional model reconstruction method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.

Referring to fig. 12, a schematic structural diagram of a computer device according to an exemplary embodiment of the present application is shown. Specifically, the method comprises the following steps: the computer apparatus 1200 includes a Central Processing Unit (CPU) 1201, a system memory 1204 including a random access memory 1202 and a read only memory 1203, and a system bus 1205 connecting the system memory 1204 and the CPU 1201. The computer device 1200 also includes a basic Input/Output system (I/O system) 1206, which facilitates transfer of information between various devices within the computer, and a mass storage device 1207 for storing an operating system 1213, application programs 1214, and other program modules 1215.

The basic input/output system 1206 includes a display 1208 for displaying information and an input device 1209, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 1208 and input device 1209 are connected to the central processing unit 1201 through an input-output controller 1210 coupled to the system bus 1205. The basic input/output system 1206 may also include an input/output controller 1210 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1210 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1207 is connected to the central processing unit 1201 through a mass storage controller (not shown) connected to the system bus 1205. The mass storage device 1207 and its associated computer-readable media provide non-volatile storage for the computer device 1200. That is, the mass storage device 1207 may include a computer-readable medium (not shown) such as a hard disk or drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes Random Access Memory (RAM), Read Only Memory (ROM), flash Memory or other solid state Memory technology, Compact disk Read-Only Memory (CD-ROM), Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1204 and mass storage device 1207 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 1201, the one or more programs containing instructions for implementing the methods described above, and the central processing unit 1201 executes the one or more programs to implement the methods provided by the various method embodiments described above.

According to various embodiments of the present application, the computer device 1200 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1200 may connect to the network 1212 through a network interface unit 1211 coupled to the system bus 1205, or may connect to other types of networks or remote computer systems (not shown) using the network interface unit 1211.

The memory also includes one or more programs, stored in the memory, that include instructions for performing the steps performed by the computer device in the methods provided by the embodiments of the present application.

The present application further provides a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the computer-readable storage medium, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the method for reconstructing a three-dimensional model according to any one of the foregoing embodiments.

The present application further provides a computer program product, which when run on a computer, causes the computer to execute the method for reconstructing a three-dimensional model provided by the above-mentioned method embodiments.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, which may be a computer readable storage medium contained in a memory of the above embodiments; or it may be a separate computer-readable storage medium not incorporated in the terminal. The computer readable storage medium has stored therein at least one instruction, at least one program, a set of codes, or a set of instructions that are loaded and executed by the processor to implement the method of reconstructing a three-dimensional model according to any of the above method embodiments.

Optionally, the computer-readable storage medium may include: ROM, RAM, Solid State Drives (SSD), or optical disks, etc. The RAM may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM), among others. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is intended to be exemplary only, and not to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included therein.

Claims

1. A method of reconstructing a three-dimensional model, the method comprising:

2. The method of claim 1, wherein the mesh reconstruction network comprises a keypoint extraction subnetwork, an adjacency extraction subnetwork, and a first decoder;

the inputting the target feature map into the mesh reconstruction network includes:

inputting the target feature map into the key point extraction sub-network to obtain a key point feature map output by the key point extraction sub-network, wherein the key point feature map comprises image features of key feature points in the target image;

inputting the target feature map into the adjacency relation extraction sub-network to obtain an adjacency matrix output by the adjacency relation extraction sub-network, wherein the adjacency matrix is used for representing the adjacency relation of feature points in the target reconstruction object;

and carrying out graph convolution processing on the key point feature graph and the adjacency matrix through the first decoder.

3. The method of claim 2, wherein the performing, by the first decoder, graph convolution processing on the keypoint feature graph and the adjacency matrix comprises:

4. The method according to claim 2, wherein the voxel reconstruction network comprises a second decoder for performing three-dimensional convolution processing on the target feature map, and the first decoder and the second decoder are connected semantically;

after inputting the target feature map into the mesh reconstruction network and inputting the target feature map into the voxel reconstruction network, the method further comprises:

acquiring three-dimensional features output by the nth three-dimensional convolutional layer in the second decoder, wherein the three-dimensional features are used for representing the features of reconstructed voxels, and n is an integer greater than or equal to 1;

obtaining graph node characteristics output by the nth graph convolution layer in the first decoder, wherein the graph node characteristics are used for representing characteristics of reconstructed grid vertexes;

and performing feature fusion on the three-dimensional features and the graph node features, and inputting the fused features into the (n + 1) th graph convolutional layer.

5. The method of claim 4, wherein said feature fusing said three-dimensional features and said graph node features comprises:

acquiring vertex coordinates of the graph node characteristics corresponding to the vertex of the reconstruction grid;

and performing linear interpolation operation on the three-dimensional characteristics on the periphery side of the vertex coordinates, and performing characteristic fusion on the characteristics obtained by the linear interpolation operation and the graph node characteristics.

6. The method according to any one of claims 1 to 5, wherein after the constructing the target three-dimensional model based on the target mesh information output by the mesh reconstruction network, the method further comprises:

displaying the target three-dimensional model in a virtual environment, the target three-dimensional model being at least one of a virtual prop, a virtual vehicle, or a virtual object in the virtual environment;

alternatively, the first and second electrodes may be,

and sending the model data of the target three-dimensional model to a three-dimensional printing device, wherein the three-dimensional printing device is used for printing a solid model according to the model data.

7. The method of any of claims 1 to 5, further comprising:

performing feature extraction on a sample image to obtain a sample feature map corresponding to the sample image, wherein the sample image comprises an image of a sample reconstruction object;

inputting the sample feature map into the grid reconstruction network, and inputting the sample feature map into the voxel reconstruction network;

acquiring sample grid information output by the grid reconstruction network and sample voxel information output by the voxel reconstruction network;

determining network loss according to the sample grid information, the sample voxel information and a standard three-dimensional model, wherein the standard three-dimensional model is a three-dimensional model corresponding to the sample reconstruction object;

and training the grid reconstruction network and the voxel reconstruction network according to the network loss.

8. The method according to claim 7, wherein the sample feature map is obtained by feature extraction of the sample image by an encoder;

the method further comprises the following steps:

acquiring an intermediate feature map output by the encoder, wherein the feature extraction depth of the intermediate feature map is lower than that of the sample feature map;

jumping connection is carried out on the intermediate feature map and the sample feature map, and the feature map obtained through connection is input into a third decoder to obtain a reconstructed sample image output by the third decoder;

determining a network loss according to the sample grid information, the sample voxel information, and a standard three-dimensional model, comprising:

determining the network loss from the sample mesh information, the sample voxel information, the standard three-dimensional model, the sample image, and the reconstructed sample image;

the training the mesh reconstruction network and the voxel reconstruction network according to the network loss includes:

training the mesh reconstruction network, the voxel reconstruction network, the encoder, and the third decoder according to the network loss.

9. The method of claim 8, wherein determining the network loss from the sample mesh information, the sample voxel information, the standard three-dimensional model, the sample image, and the reconstructed sample image comprises:

10. An apparatus for reconstructing a three-dimensional model, the apparatus comprising:

11. A computer device comprising a processor and a memory, said memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, said at least one instruction, said at least one program, said set of codes, or said set of instructions being loaded and executed by said processor to implement a method of reconstructing a three-dimensional model according to any one of claims 1 to 9.

12. A computer readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of reconstructing a three-dimensional model according to any one of claims 1 to 9.