CN115375847B

CN115375847B - Material recovery method, three-dimensional model generation method and model training method

Info

Publication number: CN115375847B
Application number: CN202211029148.9A
Authority: CN
Inventors: 吴进波; 刘星; 赵晨; 丁二锐; 吴甜; 王海峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-08-25
Filing date: 2022-08-25
Publication date: 2023-08-29
Anticipated expiration: 2042-08-25
Also published as: CN115375847A

Abstract

The disclosure provides a material recovery method, a three-dimensional model generation method, a model training method, a device, equipment and a medium, relates to the field of artificial intelligence, in particular to the technical fields of augmented reality, virtual reality, computer vision, deep learning and the like, and can be applied to scenes such as metauniverse and the like. The specific implementation scheme of the material recovery method is as follows: generating a grid model of the target object according to voxel data of the target object; determining pixel positions of pixel points corresponding to each grid in the grid model on a material map for a target object; and inputting the grid model and the pixel positions into a material estimation network to obtain a material map for the target object.

Description

Material recovery method, three-dimensional model generation method and model training method

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the technical fields of augmented reality, virtual reality, computer vision, deep learning and the like, and can be applied to scenes such as metauniverse and the like.

Background

With the development of computer technology and network technology, virtual reality and augmented reality technologies and the like have been rapidly developed. In virtual reality and augmented reality techniques, reconstruction of objects is often required. Reconstruction of an object requires material information related to the object. Typically, material information is manually added to the reconstructed three-dimensional model in accordance with an art design.

Disclosure of Invention

The disclosure aims to provide a material recovery method based on deep learning, a three-dimensional model generation method, a material recovery model training method, a device, equipment and a medium, so as to generate a three-dimensional model based on recovered material information, and enable the generated three-dimensional model to be applied to a traditional rendering engine.

According to one aspect of the present disclosure, there is provided a material recovery method, including: generating a grid model of the target object according to voxel data of the target object; determining pixel positions of pixel points corresponding to each grid in the grid model on a material map for a target object; and inputting the grid model and the pixel positions into a material estimation network to obtain a material map for the target object.

According to another aspect of the present disclosure, there is provided a method for generating a three-dimensional model, including: generating a texture map for the target object according to voxel data for the target object; and generating a three-dimensional model of the target object according to the texture map and a grid model of the target object, wherein the texture map is obtained by adopting the texture recovery method provided by the disclosure, and the grid model is generated according to voxel data.

According to another aspect of the present disclosure, there is provided a training method of a material recovery model, wherein the material recovery model includes a material estimation network; the training method comprises the following steps: generating a grid model of the target object according to an original image comprising the target object and a camera pose aiming at the original image; determining pixel positions of pixel points corresponding to each grid in the grid model on an original image; inputting the grid model and the pixel position into a material estimation network to obtain a material map for a target object; rendering according to the material map, the grid model and the camera pose to obtain a target image comprising a target object; and training the material estimation network according to the difference between the target image and the original image.

According to one aspect of the present disclosure, there is provided a material recovery apparatus including: the model generation module is used for generating a grid model of the target object according to voxel data aiming at the target object; the pixel position determining module is used for determining the pixel position of a pixel point corresponding to each grid in the grid model on the texture map aiming at the target object; and the mapping obtaining module is used for inputting the grid model and the pixel positions into a material estimation network to obtain a material mapping aiming at the target object.

According to another aspect of the present disclosure, there is provided a generating apparatus of a three-dimensional model, including: the mapping generation module is used for generating a material mapping for the target object according to the voxel data for the target object; and the model generation module is used for generating a three-dimensional model of the target object according to the texture map and a grid model of the target object, wherein the texture map is obtained by adopting the texture recovery device provided by the disclosure, and the grid model is generated according to voxel data.

According to another aspect of the present disclosure, there is provided a training apparatus of a material recovery model, wherein the material recovery model includes a material estimation network, the training apparatus comprising: the model generation module is used for generating a grid model of the target object according to an original image comprising the target object and a camera pose aiming at the original image; the pixel position determining module is used for determining the pixel position of the pixel point corresponding to each grid in the grid model on the original image; the mapping obtaining module is used for inputting the grid model and the pixel position into a material estimation network to obtain a material mapping aiming at the target object; the first image rendering module is used for rendering a target image comprising a target object according to the material map, the grid model and the camera pose; and the training module is used for training the material estimation network according to the difference between the target image and the original image.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the following methods provided by the present disclosure: a material recovery method, a three-dimensional model generation method and a material recovery model training method.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform at least one of the following methods provided by the present disclosure: a material recovery method, a three-dimensional model generation method and a material recovery model training method.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement at least one of the following methods provided by the present disclosure: a material recovery method, a three-dimensional model generation method and a material recovery model training method.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of a material recovery method, a training method of a three-dimensional model, and a training method and apparatus of a material recovery model according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a material recovery method according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram of a grid model of a target object generated from voxel data according to an embodiment of the disclosure;

FIG. 4 is a flow diagram of a method of generating a three-dimensional model according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of a training method of a texture restoration model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a training material recovery model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a training texture recovery model according to another embodiment of the present disclosure;

FIG. 8 is a block diagram of a texture restoration device according to an embodiment of the present disclosure;

FIG. 9 is a block diagram of a three-dimensional model generating apparatus according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of a training apparatus for a texture restoration model according to an embodiment of the present disclosure; and

fig. 11 is a block diagram of an electronic device for implementing the methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the construction of a three-dimensional model, recovery of materials is an indispensable step. The art design may typically be relied upon to manually add texture information to each grid in the three-dimensional model. The method for adding the material information has the problems of time and labor consumption.

For example, a neural rendering-based approach may also be employed to recover texture information. However, the neural rendering method generally generates an image at a predetermined viewing angle, and cannot output a mesh model and a texture map, so that a conventional three-dimensional rendering engine cannot be used to render the image. As such, the use of this approach can be greatly limited.

Based on this, the present disclosure aims to provide a material map that can be generated for a three-dimensional model so that the three-dimensional model can be applied to a material recovery method of a conventional rendering engine, a generation method of the three-dimensional model, and a training method of the material recovery model.

The following is a description of the terms of art to which the present disclosure relates:

the three-dimensional rendering engine is used for abstracting various objects in reality in the form of various curves or polygons, and outputting a final image algorithm set through a computer.

Neural rendering, which is a generic term of various methods for synthesizing images by a depth network, aims at realizing all or part of functions of modeling and rendering in image rendering.

Image rendering is the process of converting three-dimensional light energy transfer processing into a two-dimensional image. The work to be done in image rendering is: and performing geometric transformation, projective transformation, perspective transformation and window cutting on the three-dimensional model, and generating an image according to the acquired material and shadow information.

The symbol distance field, sign Distance Function, SDF, also known as directional distance function (oriented Distance Function), is used to determine the distance of a point to the boundary of a region over a limited region in space and to define the symbol of the distance at the same time. If the point is located outside the region boundary, the sign distance is negative if the point is located inside the region boundary, and the sign distance is 0 if the point is located on the region boundary. For example, the distance between any point in space and the boundary of the region may be represented by a field formed by the value of f (x), i.e., a sign distance field.

Rasterization is the process of converting a primitive into a two-dimensional image. Each point on the two-dimensional image contains color, depth, and texture data. The purpose of rasterization is to find pixels covered by a geometric unit (e.g., triangle). Rasterization determines how many pixels are needed to form the triangle based on the location of the triangle vertices, and what information each pixel should get, such as UV coordinates, by interpolating the vertex data. In other words, rasterization is the process by which geometric data is ultimately converted into pixels after a series of transformations, thereby rendering on a display device. Each three-dimensional model is defined by vertices and a triangle of vertices. When a three-dimensional model is drawn on a screen, the process of filling each pixel (grid) covered by each triangular surface according to three vertexes of the triangular surface is called rasterization.

UV spreading, all images are represented by a two-dimensional plane, with a horizontal direction U and a vertical direction V. By means of a two-dimensional UV coordinate on the plane, a pixel on the image can be located. The UV coordinates are U, V texture map coordinates, which define information of the position of each pixel on the image, and the position of the surface point corresponding to each pixel in the texture map is determined by interconnecting the three-dimensional model. And precisely corresponding each pixel point on the image to the surface of the three-dimensional model of the object, and performing image smooth interpolation processing on the gap position between the corresponding positions of two adjacent pixel points on the surface of the three-dimensional model by software to obtain the UV map. The process of creating a UV map is called UV unfolding.

A voxel, which is an abbreviation of Volume element (voxel), a Volume containing a voxel may be represented by a Volume rendering or a polygonal isosurface from which a given threshold contour is extracted. Voxel data is stored in a three-dimensional array, which may also be considered to be stored as a three-dimensional texture.

Mesh, which is a polygonal Mesh, is a data structure used for modeling various irregular objects in computer graphics, and is widely used because triangular patches are a minimum unit of segmentation among patches of the polygonal Mesh, and are relatively simple and flexible in representation and convenient in topology description, and often referred to as triangular patches.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic diagram of an application scenario of a material recovery method, a training method of a three-dimensional model, and a training method and apparatus of a material recovery model according to an embodiment of the disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be various electronic devices with processing functions, including but not limited to a smart phone, a tablet computer, a laptop computer, a desktop computer, a server, and the like.

The electronic device 110 may, for example, process the acquired image 120 to obtain voxel data representing the target object in the image 120. The electronic device 110 may also generate a texture map of the target object from the obtained voxel data, and combine the texture map and the mesh model of the target object to obtain a three-dimensional model 130 of the target object including texture information.

The electronic device 110 may process the image 120, for example, using a voxelized model based on deep learning, to obtain voxel data of the target object. The voxel data may be pre-acquired and stored in a database local to the electronic device 110 or connected to the electronic device 110 to be retrieved from the local or database when it is desired to generate the three-dimensional model 130 of the target object.

The mesh model of the target object may be, for example, a triangular patch model, that is, the mesh model is formed by splicing a plurality of triangular patches, and each mesh in the mesh model is one triangular patch. It will be appreciated that the mesh model may be a patch model including any number of corners, such as a four-corner patch model, in addition to a triangular patch model, each mesh being a patch, which is not limited by the present disclosure.

In one embodiment, the electronic device 110 may generate a texture map of the target object using, for example, a texture restoration model. After the texture map and the mesh model are obtained, an image of the target object at any viewing angle can be obtained through a rendering engine.

In an embodiment, as shown in fig. 1, the application scenario 100 may further include a server 150, where the server 150 may be, for example, a background management server supporting the operation of a client application in the electronic device 110. Electronic device 110 may be communicatively coupled to server 150 via a network, which may include wired or wireless communication links.

For example, the server 150 may train the texture restoration model based on the plurality of images and, in response to a request from the electronic device 110, send the trained texture restoration model 140 to the electronic device 110 to generate texture information of the target object from the texture restoration model 140 by the electronic device 110. It will be appreciated that in training the texture restoration model, the plurality of images employed are images that comprise the target object. When there are multiple target objects, then a texture recovery model may be trained for each target object.

In an embodiment, the electronic device 110 may also send the acquired voxel data to the server 150, where the server 150 processes the voxel data using the trained texture restoration model to obtain a texture map of the target object, and generates a three-dimensional model of the target object based on the texture map and the mesh model.

It should be noted that, the material recovery method and the three-dimensional model generating method provided in the present disclosure may be executed by the electronic device 110 or may be executed by the server 150. Accordingly, the material recovery device and the three-dimensional model generating device provided by the present disclosure may be provided in the electronic device 110 or may be provided in the server 150. The training method of the texture restoration model provided by the present disclosure may be performed by the server 150. Accordingly, the training device of the texture restoration model provided in the present disclosure may be disposed in the server 150.

It should be understood that the number and type of electronic devices 110 and servers 150 in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110 and servers 150 as desired for implementation.

The material recovery method provided by the present disclosure will be described in detail below with reference to fig. 2 to 3.

As shown in fig. 2, the material recovery method 200 of this embodiment may include operations S210 to S230.

In operation S210, a mesh model of the target object is generated from voxel data for the target object.

According to embodiments of the present disclosure, voxel data for a target object may represent, for example, a cube structure of a predetermined size. For example, voxel data for a target object may constitute an n×n tensor to represent a cubic structure of size n×n. Each element in the tensor represents one voxel data. It is understood that N is a positive integer.

According to embodiments of the present disclosure, the idea of iso-surface based surface rendering may be employed to derive a mesh model. First, each voxel data is processed in turn, and an equivalent surface patch included in each voxel data is determined. The patches of all voxels are then stitched together to form the surface of the entire target object. Where the iso-surface refers to that if voxel data is regarded as a sampling set about a certain physical property in a certain spatial region, and sampling values on non-sampling points and adjacent sampling points are estimated by interpolation, all point set definitions having a certain property in the spatial region are referred to as one or more curved surfaces, which are referred to as iso-surfaces. Specifically, for example, six faces of boundary voxel data in voxel data may be used to fit an isosurface, or an isosurface extraction algorithm (MC), a mobile tetrahedron subdivision algorithm (Marching Tetrahedra, MT) or the like may be used to generate a mesh model from voxel data.

In operation S220, a pixel position of a pixel point corresponding to each mesh in the mesh model on the texture map for the target object is determined.

According to the embodiment of the disclosure, the pixel positions of the pixels corresponding to each grid in the grid model on the texture map for the target object can be obtained by performing UV expansion on the grid model. For example, the mesh model may be UV-extended using any of the following UV-expansion algorithms or tools: blender tools, rizomUV tools, MVS-texting algorithms, and the like.

It is to be understood that, in this operation S220, for example, the three-dimensional coordinates of the vertices of each grid may be converted into the two-dimensional coordinates of the image coordinate system according to the preset virtual camera position and the preset viewing angle direction based on the conversion relationship between the world coordinate system and the camera coordinate system, the camera coordinate system and the image coordinate system, and the pixels corresponding to the three two-dimensional coordinates obtained according to the coordinate values of the three vertices of each grid may be connected to form a triangle area, and the pixels included in the triangle area may be used as the pixels corresponding to each grid.

In operation S230, the mesh model and the pixel position are input into a texture estimation network to obtain a texture map for the target object.

According to embodiments of the present disclosure, a texture map may be represented by a two-dimensional matrix, for example, where each element in the two-dimensional matrix represents texture information of one pixel. The material information may include, for example, at least one of Diffuse reflectance color information (Diffuse), roughness information (Roughness), and metaliness information (metalic). Setting the material information to include the three information, each element in the two-dimensional matrix may be represented by five-tuple data, where the five-tuple data respectively represents: r, G, B reflectance ratio, roughness and metallicity of the three colors. It is understood that the texture map may be output by the texture estimation network.

According to embodiments of the present disclosure, the texture estimation network may be constituted, for example, by a multi-layer perceptron (Multilayer Perceptron, MLP). It is understood that the texture estimation network may also be any deep neural network. The material estimation network learns the mapping relation between each position point of the target object and the material information through training. According to the embodiment, the pixel positions of the grid model and the pixel points corresponding to the grids are input into the material estimation network, the material estimation network can obtain the material information of the grids in the grid model according to the learned mapping relation, and the material information is corresponding to the corresponding pixel points according to the pixel positions of the pixel points corresponding to the grids, so that the material map can be obtained.

According to the embodiment of the disclosure, the texture map is generated by adopting the texture estimation network constructed based on the deep learning technology, so that the accuracy of the obtained texture information can be improved. Furthermore, as the texture map and the grid model are generated, the method of the embodiment of the disclosure can be applied to a scene in which the image of the target object is rendered by adopting a traditional rendering engine, and the robustness of the texture recovery method provided by the disclosure can be improved.

The implementation of operation S210 described above is further extended and defined below in conjunction with fig. 3.

Fig. 3 is a schematic diagram of generating a mesh model of a target object from voxel data according to an embodiment of the disclosure.

According to the embodiment of the disclosure, the grid model of the target object can be generated according to voxel data based on a deep learning method, so that the accuracy of the generated grid model is improved.

For example, the embodiment may set an area surrounded by the surface mesh of the mesh model as a limited area in space, and learn a symbol distance between each sampling point in space and an area boundary of the limited area based on the deep neural network. In this way, the embodiment may determine a plurality of spatial sampling points according to voxel data, and then estimate, based on the deep neural network, a symbol distance of the plurality of spatial sampling points for the target object, that is, a symbol distance between the plurality of spatial sampling points and a region boundary of a region surrounded by the grid model.

Specifically, as shown in fig. 3, the embodiment 300 may take voxel data 310 of a target object as an input to a deep neural network 320, and the deep neural network 320 outputs symbol distances 330 of a plurality of spatial sampling points for the target object. Wherein a center point of a volume element represented by each volume data may be taken as a spatial sampling point determined by the each volume data.

The deep neural network 320 may be composed of, for example, MLP. The deep neural network 320, trained, can learn the sign distance field between the spatial sampling points and the target object. Alternatively, the deep neural network 320 may be a fully connected network, which is not limited by the present disclosure.

In one embodiment, in training the deep neural network 320, the deep neural network 320 is used to estimate the symbol distance of each sample point and the characteristics of each sample point on a certain ray (corresponding to a certain pixel in the sample image) from the virtual camera center. The symbol distance of each sampling point and the characteristic of each sampling point are input into the MLP of the estimated color, and the color information of each sampling point can be estimated by the MLP of the estimated color. The deep neural network 320 is optimized by loss of L1 between the color information and the true color information of the corresponding pixel point in the sample image. In order to make the network more robust in weak texture regions, the deep neural network 320 may also be optimized by adding geometric a priori constraints.

After deriving the symbol distances 330 for each of the plurality of spatial sampling points determined by the voxel data for the target object, in accordance with an embodiment of the present disclosure, the embodiment 300 may employ the iso-surface extraction algorithm 350 to generate a mesh model 360 of the target object from the plurality of spatial sampling points and the symbol distances 330.

For example, the embodiment may first determine, as the target sampling point 340, a sampling point with a symbol distance of 0 among a plurality of sampling points determined from the voxel data, based on the symbol distance 330. The target sampling points 340 are surface points of the mesh model. Then, according to the target sampling point 340, an iso-surface extraction algorithm 350 is used to generate a three-dimensional patch model, which is the mesh model 360 of the target object.

Based on the material recovery method provided in the present disclosure, the present disclosure further provides a method for generating a three-dimensional model, which will be described in detail below with reference to fig. 4.

Fig. 4 is a flow diagram of a method of generating a three-dimensional model according to an embodiment of the present disclosure.

As shown in fig. 4, the three-dimensional model generation method 400 of this embodiment may include operations S410 to S420.

In operation S410, a texture map for the target object is generated from voxel data for the target object.

The operation S410 may employ the material restoration method described above to generate a material map according to the voxel data, which is not described herein. It will be appreciated that in the process of generating the texture map, a mesh model of the target object, i.e., a three-dimensional patch model of the target object, may be obtained.

In operation S420, a three-dimensional model of the target object is generated from the texture map and the mesh model of the target object.

According to the embodiment, the material information of the pixel positions of the corresponding pixel points can be assigned to each grid according to the pixel positions of the corresponding pixel points on the material map in the grid model of the target object, so that the three-dimensional model of the target object is obtained. The three-dimensional model can represent not only the three-dimensional surface of the target object but also material information of each surface point on the three-dimensional surface.

In order to facilitate implementation of the above-described material recovery method, the present disclosure further provides a training method of a material recovery model, where the material recovery model includes the above-described material estimation network. The training method of the texture restoration model will be described in detail with reference to fig. 5.

FIG. 5 is a flow chart of a training method of a texture restoration model according to an embodiment of the present disclosure.

As shown in fig. 5, the training method 500 of the texture restoration model of this embodiment includes operations S510 to S550.

In operation S510, a mesh model of a target object is generated from an original image including the target object and a relative pose with respect to the original image.

According to an embodiment of the present disclosure, the original image may be any image acquired in advance as a training sample, which is not limited in this disclosure.

According to embodiments of the present disclosure, an instant localization and mapping algorithm (simultaneous localization and mapping, SLAM) may be employed, for example, for the relative pose of the original image, calculated from the original image. For example, the original image may be a plurality of images, and the embodiment may take an image acquired under a predetermined camera pose among the plurality of images as a reference image, and combine other images than the reference image with the reference image into an image pair. And then extracting the characteristic points of each image in the image pair, then establishing a matching relation between the characteristic points of the two images in the image pair, and calculating the relative camera pose between the two images according to the matching relation between the characteristic points of the two images. And obtaining the camera pose of each image in the other images except the reference image according to the preset camera position and the relative camera pose.

In this embodiment, according to the pose of the camera of the original image, multiple virtual rays passing through the pixel points in the original image may be led out from the position where the virtual camera is located, and sampling may be performed on the multiple virtual rays, so as to obtain multiple spatial sampling points. And taking the plurality of spatial sampling points as central points of a plurality of volume elements to obtain a plurality of voxel data. Subsequently, a mesh model of the target object may be generated from the plurality of voxel data using a similar principle as in operation S210 described above.

In operation S520, pixel positions of pixel points corresponding to respective grids in the grid model on the original image are determined.

This embodiment may use a similar principle to operation S220 described above to obtain correspondence between each grid in the grid model and the pixel points on the original image, thereby obtaining the pixel positions.

In one embodiment, the grid model may be rasterized to obtain pixel locations for each grid in the grid model in a pixel coordinate system.

In operation S530, the mesh model and the pixel positions are input into a texture estimation network to obtain a texture map for the target object.

The implementation principle of this operation S530 is similar to that of the operation S230 described above, and will not be described here again.

In operation S540, a target image including the target object is rendered according to the texture map, the mesh model, and the camera pose.

According to embodiments of the present disclosure, the embodiments may employ a physics-based rendering technique (Physically Based Rendering, PBR) to render the resulting target image. Among them, the properties used by the PBR may include diffuse, roughness, metallic, normal and the like. Where normal is the normal of each mesh in the mesh model. It is to be appreciated that the above-described rendering techniques are merely examples to facilitate understanding of the present disclosure, and that the present disclosure may also employ other rendering techniques to render a resulting target image. According to the embodiment, the camera pose is considered when the target image is generated, so that the observation angle of the target object in the generated target image is consistent with the observation angle of the target object in the original image, the variable quantity when the difference is determined can be reduced, and the training precision and efficiency are improved.

In operation S550, the texture estimation network is trained according to the difference between the target image and the original image.

The embodiment calculates the pixel-wise loss between the target image and the original image, aims at minimizing the pixel-wise loss, and adjusts the network parameters in the material estimation network, thereby realizing the training of the material estimation network.

According to the embodiment of the disclosure, the texture estimation network is trained according to the difference between the images, so that the accuracy of predicting the texture map by the texture estimation network can be improved. Based on the method, the fidelity of the three-dimensional model of the generated target object can be improved, the reality of the virtual reality or the augmented reality scene can be improved, and the user experience can be improved.

The principles of training a texture restoration model are further extended and defined below in connection with fig. 6.

FIG. 6 is a schematic diagram of a training material recovery model according to an embodiment of the present disclosure.

In accordance with an embodiment of the present disclosure, as shown in FIG. 6, in this embodiment 600, the texture recovery model includes a deep neural network 610 for estimating symbol distances in addition to a texture estimation network 620. The mesh model of the target object may be generated using a deep neural network 610 for estimating symbol distances.

In this way, when the material recovery model is trained, for example, according to the original image 601 and the camera pose 602 for the original image, a plurality of spatial sampling points 603 corresponding to the pixels in the original image may be obtained by sampling, that is, the spatial points may be collected from the rays directed from the optical center position of the virtual camera to the pixels in the original image, so as to obtain a plurality of spatial sampling points.

The voxel data obtained based on the plurality of spatial sampling points is then processed using the deep neural network 610 to obtain the symbol distances 604 of the plurality of spatial sampling points for the target object. The principle of obtaining the symbol distance 604 is similar to that of obtaining the symbol distance in the description of fig. 3, and will not be described herein.

A mesh model 605 of the target object may then be generated using an isosurface extraction algorithm based on the spatial sampling points and the symbol distances. The principles of generating the mesh model 605 are similar to those described above with respect to fig. 3, and will not be described in detail herein.

After the mesh model is obtained, the embodiment may obtain the pixel positions 606 of the pixel points corresponding to each mesh in the mesh model 605 on the original image by using the operation S520 described above. Subsequently, the pixel locations 606 and the mesh model 605 are input into the texture estimation network 620, and a texture map 607 for the target object may be output by the texture estimation network 620. After obtaining the texture map 607, a target image 608 may be rendered from the texture map 607, the mesh model 605, and the camera pose 602.

After the target image 608 is obtained, the embodiment may calculate the inter-pixel loss between the target image 608 and the original image 601, training the texture estimation network 620 based on the loss.

In an embodiment, after the mesh model 605 is obtained, a reference image 609 including the target object may also be rendered, for example, from the mesh model 605 and the camera pose 602. After the reference image 609 is obtained, the deep neural network 610 may be trained based on the differences between the reference image 609 and the original image 601.

Wherein the mesh model 605 may be projected onto the image plane, for example, based on the camera pose 602, resulting in the reference image 609. It is to be appreciated that the rendering engine may also be employed to render an image of the mesh model 605 for the camera pose 602 as the reference image 609, and the present disclosure is not limited to a method of generating the reference image 609. The resulting reference image 609 may also be rendered in conjunction with an environment map for the original image.

It will be appreciated that the difference between the reference image 609 and the original image 601 may be represented, for example, by an inter-pixel loss. This embodiment may train the deep neural network 610 with the goal of minimizing the difference between the reference image 609 and the original image 601.

In one embodiment, the deep neural network 610 may be trained using the original image as a training sample, and the texture estimation network 620 may be trained using the original image with the loss of the deep neural network 610 converging. This is due to the typically low accuracy requirements for the deep neural network 610 that generates the mesh model. Alternatively, the deep neural network 610 and the texture estimation network 620 may be trained synchronously.

The embodiment of the disclosure can enable the grid model to be generated based on a deep learning method by training the deep neural network for estimating the symbol distance, thereby generating a high-quality grid model.

Operation S550 described above will be further expanded and defined below in connection with fig. 7-8.

Fig. 7 is a schematic diagram of training a texture estimation network according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, in training a texture estimation network, for example, a mask image for a target object may be generated first, and an original image and a target image are processed based on the mask image to compare only image portions for the target object when determining differences between the original image and the target image. Therefore, the influence of the environmental information on the training of the material estimation network can be avoided, the accuracy of the material estimation network is improved, and the training efficiency is improved. This is because the texture estimation network only estimates texture information of the target object.

As shown in fig. 7, after obtaining a mesh model 701 using operation S510 described above, the embodiment 700 may render a reference image 703 including a target object according to the mesh model 701 and a camera pose 702. The principle of rendering the reference image 703 is similar to that of rendering the reference image described above, and will not be described here.

The embodiment 700 may generate a mask image 704 for the target object based on the position of the target object in the reference image 703. The mask image 704 may be a binary image, where the mask value of the region corresponding to the target object is 1, and the mask values of the other regions are 0.

After obtaining the mask image 704, the embodiment 700 may mask the target image 705 with the mask image to obtain a first masked image 707, and mask the original image 706 with the mask image to obtain a second masked image 708. For example, the mask processing may be implemented by multiplying the mask value of each pixel point in the mask image by the pixel values of the corresponding pixel points in the target image 705 and the original image 706.

This embodiment, after obtaining the first image 707 and the second image 708, may train the texture estimation network 710 based on the difference between the two images.

In an embodiment, the material estimation network may also be trained by using the color gradient of each pixel point in the original image as a supervisory signal. This is because the color gradient and the material gradient of each pixel point generally tend to be consistent, and the difference of materials can be generally represented by colors. Therefore, the embodiment can train the material estimation network from multiple dimensions, which is beneficial to improving the training efficiency and accuracy of the material estimation network, thereby reducing the material seams in the generated material map.

For example, in the embodiment, when training the texture restoration model, the Color Gradient (Color Gradient) of each pixel point in the original image may be determined in real time, or the Color Gradient may be acquired from a storage space in which the Color Gradient is stored in advance. After the texture map is obtained through the processing of the texture estimation network, the texture gradient of each pixel point in the texture map can be determined. It will be appreciated that the color gradient and texture gradient may be determined, for example, by invoking a preset gradient interface.

After the color gradient and the material gradient are obtained, the loss of the material estimation network can be determined according to the corresponding relation between the pixel points in the original image and the pixel points in the material map and according to the relative difference between the color gradient of the first pixel point in the original image and the material gradient of the second pixel point corresponding to the first pixel point in the material map. The texture estimation network is then trained with the goal of minimizing this loss.

In an embodiment, when determining the color gradient and the material gradient, for the convenience of calculation, the color gradient of each pixel point in the width direction and the height direction of the original image may be determined, so as to obtain the color gradient in the horizontal direction and the color gradient in the vertical direction. Similarly, a horizontal texture gradient and a vertical texture gradient for each pixel in the texture map may be determined. And taking a weighted sum of a first relative difference between the horizontal color gradient of the first pixel point and the horizontal material gradient of the second pixel point and a second relative difference between the vertical color gradient of the first pixel point and the vertical material gradient of the second pixel point as a loss of the material estimation network. The weights used in calculating the weighted sum may be set according to actual requirements, which is not limited in this disclosure. The present disclosure determines the loss by decomposing the gradient into two directions, and can improve the accuracy and computational efficiency of the determined loss.

In an embodiment, the horizontal direction may be taken as the first direction and the vertical direction as the second direction. The color gradient of the first pixel point in the first direction is set as a first gradient, the color gradient of the first pixel point in the second direction is set as a second gradient, the material gradient of the second pixel point in the first direction is set as a third gradient, and the material gradient of the second pixel point in the second direction is set as a fourth gradient. In this embodiment, when training the texture estimation network, for example, a first gradient weight for a first direction may be determined according to a first gradient of a first pixel point; and determining a second gradient weight for the second direction according to the second gradient of the first pixel point. For example, the first gradient weight may be inversely related to the first gradient and the second gradient weight may be inversely related to the second gradient. Subsequently, the embodiment may determine a weighted sum of the third gradient and the fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight. The embodiment may use the weighted sum as a penalty for the texture estimation network, and train the texture estimation network based on the weighted sum. Through this embodiment, when the material gradient is big, and the color gradient is little, will make the loss of material estimation network great, through training the material estimation network according to this loss, can make the material gradient trend of change with the color gradient tend to be unanimous to the precision of the material estimation network that can improve the training and obtain improves the degree of accuracy of the material map that obtains.

In one embodiment, the gradient weight and gradientThe relationship between the first and second gradients can be expressed by an exponential function, e.g. if the first gradient is Idx, then the first gradient weight of the first direction _dx Can be expressed by the following formula (1), and similarly, the first gradient is Idy, and the second gradient weight in the second direction is set _dy The following formula (2) can be used to represent:

weight _dx ＝e ^-λIdx formula (1);

weight _dy ＝e ^-λIdy equation (2).

Wherein lambda is a super parameter and can be set according to actual requirements.

In an embodiment, the texture information may include at least two of the following: diffuse reflection color information, roughness information, and metallicity information. In this embodiment, when determining the material gradient, the gradient may be calculated for each material information, resulting in at least two gradients for at least two information. Then, the embodiment may determine a texture gradient of the second pixel point in the texture map according to the at least two gradients of the second pixel point. For example, a weighted sum of at least two gradients may be used as the material gradient of the second pixel, and the weight during weighting may be set according to the actual requirement. It should be understood that the above manner of taking the weighted sum of at least two gradients as the material gradient is merely taken as an example to facilitate understanding of the disclosure, and the disclosure may also take, for example, an average value of at least two gradients or the like as the material gradient of the second pixel, which is not limited in this disclosure.

In an embodiment, a loss value of the texture estimation network may be calculated according to a texture gradient for each texture information by using the method described above, and a sum of at least two loss values calculated from at least two texture information may be used to represent a loss of the texture estimation network. It will be appreciated that the texture gradient of each texture information may include the gradients described above for the first direction and the gradients described above for the second direction.

Based on the material recovery method provided by the present disclosure, the present disclosure further provides a material recovery device, which will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a texture restoration device according to an embodiment of the present disclosure.

As shown in fig. 8, the texture restoration apparatus 800 of this embodiment may include a model generation module 810, a pixel location determination module 820, and a map obtaining module 830.

Model generation module 810 may be used to generate a mesh model for a target object based on voxel data for the target object. In an embodiment, the model generating module 810 may be configured to perform the operation S210 described above, which is not described herein.

The pixel position determining module 820 is configured to determine a pixel position of a pixel point corresponding to each grid in the grid model on the texture map for the target object. In an embodiment, the pixel location determining module 820 may be used to perform the operation S220 described above, which is not described herein.

The map obtaining module 830 is configured to input the mesh model and the pixel position into a texture estimation network, so as to obtain a texture map for the target object. In an embodiment, the map obtaining module 830 may be configured to perform the operation S230 described above, which is not described herein.

According to an embodiment of the present disclosure, the model generating module 810 includes: the symbol distance obtaining sub-module is used for processing the voxel data by adopting a deep neural network to obtain symbol distances of a plurality of space sampling points aiming at the target object, wherein the space sampling points are determined according to the voxel data; and the model generation sub-module is used for generating a grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of spatial sampling points and the symbol distance.

Based on the method for generating the three-dimensional model provided by the present disclosure, the present disclosure further provides a device for generating the three-dimensional model, which will be described in detail below with reference to fig. 9.

Fig. 9 is a block diagram of a structure of a three-dimensional model generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the three-dimensional model generating apparatus 900 of this embodiment may include a map generating module 910 and a model generating module 920.

The map generation module 910 is configured to generate a texture map for a target object according to voxel data for the target object. Wherein, the texture map is obtained by adopting the texture recovery device provided by the disclosure. In an embodiment, the map generating module 910 may be configured to perform the operation S410 described above, which is not described herein.

The model generation module 920 is configured to generate a three-dimensional model of the target object according to the texture map and the mesh model of the target object. Wherein the mesh model is generated from voxel data. In an embodiment, the model generating module 920 may be configured to perform the operation S420 described above, which is not described herein.

Based on the training method of the material recovery model provided by the present disclosure, the present disclosure further provides a training device of the material recovery model, and the device will be described in detail below with reference to fig. 10.

FIG. 10 is a block diagram of a training apparatus for a texture restoration model according to an embodiment of the present disclosure.

As shown in fig. 10, the training apparatus 1000 of the texture restoration model of this embodiment may include a model generation module 1010, a pixel position determination module 1020, a map obtaining module 1030, a first image rendering module 1040, and a training module 1050. The material recovery model comprises the material estimation network.

The model generation module 1010 is configured to generate a mesh model of the target object according to an original image including the target object and a camera pose for the original image. In an embodiment, the model generating module 1010 may be configured to perform the operation S510 described above, which is not described herein.

The pixel position determining module 1020 is configured to determine a pixel position of a pixel point corresponding to each grid in the grid model on the original image. In an embodiment, the pixel location determining module 1020 may be configured to perform the operation S520 described above, which is not described herein.

The map obtaining module 1030 is configured to input the mesh model and the pixel position into a texture estimation network to obtain a texture map for the target object. In an embodiment, the map obtaining module 1030 may be used to perform the operation S530 described above, which is not described herein.

The first image rendering module 1040 is configured to render a target image including a target object according to the texture map, the mesh model, and the camera pose. In an embodiment, the first image rendering module 1040 may be used to perform the operation S540 described above, which is not described herein.

The training module 1050 is configured to train the texture estimation network based on differences between the target image and the original image. In an embodiment, the training module 1050 may be used to perform the operation S550 described above, which is not described herein.

According to an embodiment of the present disclosure, the apparatus 1000 may further include: the second image rendering module is used for rendering a reference image comprising a target object according to the grid model and the camera pose; and a mask image generation module for generating a mask image for the target object according to the position of the target object in the reference image. Wherein, the training module 1050 may include: the mask processing sub-module is used for respectively carrying out mask processing on the target image and the original image by adopting the mask image to obtain a first image and a second image after mask processing; and a training sub-module for training the texture estimation network according to the difference between the first image and the second image.

According to an embodiment of the present disclosure, the apparatus 1000 may further include: and the material gradient determining module is used for determining the material gradient of each pixel point in the material map. The training module 1050 may be further configured to train the texture estimation network according to a color gradient of a first pixel in the original image and a texture gradient of a second pixel corresponding to the first pixel in the texture map.

According to an embodiment of the present disclosure, the color gradient includes a first gradient in a first direction and a second gradient in a second direction; the material gradient comprises a third gradient in the first direction and a fourth gradient in the second direction; the first direction and the second direction are perpendicular to each other. The training module 1050 may include: the gradient weight determining submodule is used for respectively determining a first gradient weight aiming at a first direction and a second gradient weight aiming at a second direction according to a first gradient and a second gradient of the first pixel point; the weighting sub-module is used for determining a weighted sum of a third gradient and a fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight; and the training sub-module is used for training the material estimation network according to the weighted sum.

According to an embodiment of the disclosure, each pixel in the texture map includes at least two of the following texture information: diffuse reflection color information, roughness information, and metallicity information. The gradient determining module may include: the first determining submodule is used for determining gradients of each piece of information in at least two pieces of information of each pixel point in the texture map to obtain at least two gradients; and the second determining submodule is used for determining the material gradient of each pixel point in the material map according to at least two gradients.

According to an embodiment of the present disclosure, the material recovery model further includes a deep neural network estimating a symbol distance. The model generation module 1010 may include: the sampling sub-module is used for sampling to obtain a plurality of spatial sampling points corresponding to the pixel points in the original image according to the original image and the camera pose; the symbol distance determining sub-module is used for processing voxel data obtained based on a plurality of space sampling points by adopting a deep neural network to obtain symbol distances of the space sampling points for a target object; and the model generation sub-module is used for generating a grid model of the target object by adopting an isosurface extraction algorithm according to the space sampling points and the symbol distances.

According to an embodiment of the present disclosure, the apparatus 1000 may further include: and the third image rendering module is used for rendering a reference image comprising the target object according to the grid model and the camera pose. The training module 1050 described above may also be used to: the deep neural network is trained based on the differences between the reference image and the original image.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated. In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement the methods of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Various components in device 1100 are connected to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the respective methods and processes described above, for example, at least one of the following methods: a material recovery method, a three-dimensional model generation method and a material recovery model training method. For example, in some embodiments, at least one of the following methods: the texture restoration method, the three-dimensional model generation method, and the texture restoration model training method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of at least one of the following methods described above may be performed: a material recovery method, a three-dimensional model generation method and a material recovery model training method. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform at least one of the following methods by any other suitable means (e.g., by means of firmware): a material recovery method, a three-dimensional model generation method and a material recovery model training method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of training a material recovery model, wherein the material recovery model comprises a material estimation network; the method comprises the following steps:

generating a first grid model of a target object according to an original image comprising the target object and a camera pose aiming at the original image;

determining a first pixel position of a pixel point corresponding to each grid in the first grid model on the original image;

Inputting the first grid model and the first pixel position into a material estimation network to obtain a first material map aiming at the target object;

rendering to obtain a target image comprising the target object according to the first material map, the first grid model and the camera pose; and

training the texture estimation network based on the difference between the target image and the original image,

wherein the method further comprises:

determining the material gradient of each pixel point in the first material map; and

training the material estimation network according to the color gradient of a first pixel point in the original image and the material gradient of a second pixel point corresponding to the first pixel point in the first material map;

wherein the color gradient includes a first gradient in a first direction and a second gradient in a second direction; the material gradient comprises a third gradient in the first direction and a fourth gradient in the second direction; the first direction and the second direction are perpendicular to each other; the training the texture estimation network according to the color gradient of the first pixel point in the original image and the texture gradient of the second pixel point corresponding to the first pixel point in the first texture map includes:

Determining a first gradient weight for the first direction and a second gradient weight for the second direction according to the first gradient and the second gradient of the first pixel point; the first gradient weight is inversely related to the first gradient and the second gradient weight is inversely related to the second gradient;

determining a weighted sum of the third gradient and the fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight; and

and taking the weighted sum as the loss of the material estimation network, and training the material estimation network.

2. The method of claim 1, further comprising:

rendering to obtain a reference image comprising the target object according to the first grid model and the camera pose; and

generating a mask image for the target object according to the position of the target object in the reference image;

wherein training the texture estimation network according to the difference between the target image and the original image comprises:

performing mask processing on the target image and the original image by adopting the mask image to obtain a first image and a second image after mask processing; and

And training the material estimation network according to the difference between the first image and the second image.

3. The method of claim 1, wherein each pixel in the first texture map comprises at least two of the following texture information: diffuse reflection color information, roughness information, and metaliness information; the determining the material gradient of each pixel point in the first material map includes:

determining gradients of each piece of information in the at least two pieces of information of each pixel point in the first material map to obtain at least two gradients; and

and determining the material gradient of each pixel point in the first material map according to the at least two gradients.

4. The method of claim 1, wherein the texture recovery model further comprises a deep neural network for estimating symbol distances; the generating a first grid model of a target object according to an original image comprising the target object and a camera pose for the original image comprises:

sampling to obtain a plurality of first space sampling points corresponding to pixel points in the original image according to the original image and the camera pose;

processing voxel data obtained based on the plurality of first space sampling points by adopting the deep neural network to obtain first symbol distances of the plurality of first space sampling points for the target object; and

And generating a first grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of first space sampling points and the first symbol distance.

5. The method of claim 4, further comprising:

training the deep neural network according to the difference between the reference image and the original image.

6. A material recovery method, comprising:

generating a second grid model of the target object according to voxel data of the target object;

determining a second pixel position of a pixel point corresponding to each grid in the second grid model on a second material map aiming at the target object; and

inputting the second grid model and the second pixel position into a material estimation network to obtain a second material map for the target object,

wherein the texture estimation network is a texture estimation network in a texture recovery model trained by the method of any one of claims 1 to 5.

7. The method of claim 6, wherein the generating a second mesh model of the target object from voxel data for the target object comprises:

Processing the voxel data by adopting a deep neural network to obtain second symbol distances of a plurality of second spatial sampling points respectively aiming at the target object; wherein the plurality of second spatial sampling points are determined from the voxel data; and

and generating a second grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of second space sampling points and the second symbol distance.

8. A method of generating a three-dimensional model, comprising:

generating a second texture map for the target object according to voxel data for the target object; and

generating a three-dimensional model of the target object according to the second material map and the second grid model of the target object,

wherein the second texture map is obtained by the method of any one of claims 6 to 7; the second mesh model is generated from the voxel data.

9. A training device of a material recovery model, wherein the material recovery model comprises a material estimation network; the device comprises:

the model generation module is used for generating a first grid model of the target object according to an original image comprising the target object and a camera pose aiming at the original image;

The pixel position determining module is used for determining a first pixel position of a pixel point corresponding to each grid in the first grid model on the original image;

the mapping obtaining module is used for inputting the first grid model and the first pixel position into a material estimation network to obtain a first material mapping aiming at the target object;

the first image rendering module is used for rendering a target image comprising the target object according to the first material map, the first grid model and the camera pose; and

a training module for training the texture estimation network according to the difference between the target image and the original image,

the device further comprises a material gradient determining module, which is used for determining the material gradient of each pixel point in the first material map;

the training module is further configured to train the texture estimation network according to a color gradient of a first pixel in the original image and a texture gradient of a second pixel corresponding to the first pixel in the first texture map;

wherein the color gradient includes a first gradient in a first direction and a second gradient in a second direction; the material gradient comprises a third gradient in the first direction and a fourth gradient in the second direction; the first direction and the second direction are perpendicular to each other; the training module comprises:

A gradient weight determining sub-module, configured to determine a first gradient weight for the first direction and a second gradient weight for the second direction according to the first gradient and the second gradient of the first pixel point, respectively; the first gradient weight is inversely related to the first gradient and the second gradient weight is inversely related to the second gradient;

a weighting sub-module, configured to determine a weighted sum of the third gradient and the fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight; and

and the training sub-module is used for taking the weighted sum as the loss of the material estimation network and training the material estimation network.

10. The apparatus of claim 9, further comprising:

the second image rendering module is used for rendering a reference image comprising the target object according to the first grid model and the camera pose; and

a mask image generating module, configured to generate a mask image for the target object according to the position of the target object in the reference image;

wherein, training module includes:

the mask processing sub-module is used for respectively carrying out mask processing on the target image and the original image by adopting the mask image to obtain a first image and a second image after mask processing; and

And the training sub-module is used for training the material estimation network according to the difference between the first image and the second image.

11. The apparatus of claim 9, wherein each pixel in the first texture map comprises at least two of the following texture information: diffuse reflection color information, roughness information, and metaliness information; the gradient determination module includes:

the first determining submodule is used for determining the gradient of each piece of information in the at least two pieces of information of each pixel point in the first texture map to obtain at least two gradients; and

and the second determining submodule is used for determining the material gradient of each pixel point in the first material map according to the at least two gradients.

12. The apparatus of claim 9, wherein the texture recovery model further comprises a deep neural network for estimating symbol distances; the model generation module comprises:

the sampling sub-module is used for sampling to obtain a plurality of first space sampling points corresponding to the pixel points in the original image according to the original image and the camera pose;

the symbol distance determining submodule is used for processing voxel data obtained based on the plurality of first space sampling points by adopting the deep neural network to obtain first symbol distances of the plurality of first space sampling points for the target object; and

And the model generation sub-module is used for generating a first grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of first space sampling points and the first symbol distance.

13. The apparatus of claim 12, further comprising:

a third image rendering module for rendering a reference image including the target object according to the first grid model and the camera pose,

wherein, training module is still used for: training the deep neural network according to the difference between the reference image and the original image.

14. A texture restoration device, comprising:

the model generation module is used for generating a second grid model of the target object according to voxel data aiming at the target object;

the pixel position determining module is used for determining a second pixel position of a pixel point corresponding to each grid in the second grid model on a second material map aiming at the target object; and

a map obtaining module, configured to input the second mesh model and the second pixel position into a material estimation network to obtain a second material map for the target object,

wherein the texture estimation network is a texture estimation network in a texture recovery model trained by the apparatus according to any one of claims 9 to 13.

15. The apparatus of claim 14, wherein the model generation module comprises:

the symbol distance obtaining sub-module is used for processing the voxel data by adopting a deep neural network for estimating the symbol distance to obtain second symbol distances of a plurality of second space sampling points aiming at the target object; wherein the plurality of second spatial sampling points are determined from the voxel data; and

and the model generation sub-module is used for generating a second grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of second space sampling points and the second symbol distance.

16. A three-dimensional model generation device, comprising:

the mapping generation module is used for generating a second material mapping for the target object according to voxel data for the target object; and

a model generation module for generating a three-dimensional model of the target object according to the second texture map and a second grid model of the target object,

wherein the second texture map is obtained by using the apparatus of any one of claims 14 to 15; the second mesh model is generated from the voxel data.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.