CN114373056A

CN114373056A - Three-dimensional reconstruction method and device, terminal equipment and storage medium

Info

Publication number: CN114373056A
Application number: CN202111553514.6A
Authority: CN
Inventors: 陶大鹏; 石宇航
Original assignee: Yunnan United Visual Technology Co ltd
Current assignee: Yunnan United Visual Technology Co ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-04-19

Abstract

The application is applicable to the field of three-dimensional reconstruction, and provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, a terminal device and a storage medium, wherein the method comprises the following steps: acquiring an image to be processed of a target object; performing three-dimensional model vertex prediction on the image to be processed to obtain a predicted vertex coordinate; acquiring a diffuse reflection map of the image to be processed, separating high-frequency and low-frequency information of the image contained in the diffuse reflection map, processing the high-frequency and low-frequency information respectively, and then superposing the high-frequency and low-frequency information to form an image key frequency information characteristic map; using the image key frequency information characteristic diagram to correct the predicted vertex coordinates, and constructing a three-dimensional reconstruction model of the target object based on the corrected predicted vertex coordinates; to further improve the realism, the three-dimensional reconstruction model is optimized using a loss function. According to the method and the device, the image details and the contour information are deeply mined, the three-dimensional reconstruction result is finely adjusted, the efficiency and the quality of the three-dimensional reconstruction model are balanced, and the precision and the robustness of the three-dimensional reconstruction can be improved.

Description

Three-dimensional reconstruction method and device, terminal equipment and storage medium

Technical Field

The present application belongs to the field of three-dimensional reconstruction technologies, and in particular, to a three-dimensional reconstruction method, an apparatus, a terminal device, and a storage medium.

Background

With the development of deep learning technology, the three-dimensional model of the image inference object has become a research hotspot and is widely concerned by researchers. The three-dimensional reconstruction model based on the image reconstruction target can well acquire low-frequency information of the surface contour of the object, and the three-dimensional reconstruction model is relatively credible by using a normal map and a light and shadow effect. However, this method ignores the existence of surface high-frequency detail information, and cannot form a high-precision three-dimensional reconstruction model. At present, methods for obtaining high-precision models can be mainly classified into three major categories: (1) is manually made by experienced art designer modelers. (2) Scan capture reconstruction is performed using specialized equipment. (3) And establishing a connection from the two-dimensional image to the three-dimensional reconstruction model based on a deep learning method. Generally, the manual method can build a very perfect model, but it is time-consuming and inefficient. The reconstruction by using professional equipment has the problems of high expense and difficult popularization. For the two methods, the method based on deep learning obviously has stronger semantic representation capability, but the existing method based on image reconstruction target three-dimensional reconstruction model needs to depend on a large number of training samples to limit the application range, and the algorithm can not effectively realize a high-precision model due to the omission of texture high-frequency detail information.

Generally speaking, the traditional method for capturing a three-dimensional model by manual manufacturing or scanning of professional equipment is time-consuming and labor-consuming, and the application range and the capability of the algorithm of the existing method for reconstructing the target three-dimensional model based on the image are limited, so that the efficient and accurate three-dimensional reconstruction is difficult to realize by the existing method for reconstructing the target three-dimensional model based on the image.

Disclosure of Invention

The embodiment of the application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, terminal equipment and a storage medium, and aims to solve the problem that in the prior art, the efficiency and the accuracy of an image reconstruction object three-dimensional model method are not high.

The embodiment of the application provides a three-dimensional reconstruction method, which comprises the following steps:

acquiring an image to be processed of a target object;

performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate;

extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;

and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.

A second aspect of the embodiments of the present application provides a device of a three-dimensional reconstruction method, including:

the acquisition module is used for acquiring an image to be processed of a target object;

the prediction module is used for carrying out three-dimensional model vertex prediction based on the image to be processed to obtain a prediction vertex coordinate;

the extraction module is used for extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;

and the three-dimensional reconstruction module is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.

A third aspect of embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.

A fifth aspect of the present application provides a computer program product, which, when run on a terminal, causes the terminal to perform the steps of the method of the first aspect described above.

As can be seen from the above, in this embodiment, an image to be processed of a target object is first obtained, and then, three-dimensional model vertex prediction is performed on the image to obtain a predicted vertex coordinate; simultaneously, extracting image frequency information of the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the prediction vertex is adjusted through the frequency information contained in the key information frequency characteristic diagram extracted from the image to be processed, and the adjusted prediction vertex coordinates are used for increasing the details and the contour information of the target three-dimensional model, so that the accuracy and the integrity of the generated target three-dimensional model are ensured.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a first flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application;

fig. 2 is a flowchart ii of a three-dimensional reconstruction method provided in an embodiment of the present application;

fig. 3 is a network structure diagram of a three-dimensional reconstruction method according to an embodiment of the present application;

fig. 4 is a flowchart three of a three-dimensional reconstruction method provided in the embodiment of the present application;

fig. 5 is a fourth flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application;

fig. 6 is a structural diagram of a three-dimensional reconstruction apparatus according to an embodiment of the present application;

fig. 7 is a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In particular implementations, the terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).

In the discussion that follows, a terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Referring to fig. 1, fig. 1 is a first flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application. As shown in fig. 1, a three-dimensional reconstruction method includes the following steps:

step 101, acquiring an image to be processed of a target object.

In some embodiments, the image to be processed is an image obtained by shooting a target object, or an image frame captured from an existing video containing the target object.

Specifically, when the target object is photographed to obtain the image to be processed, the image to be processed may be realized by using a monocular camera, and at this time, the image to be processed is specifically a monocular image, or the image to be processed is realized by using a multi-view camera, and at this time, the image to be processed is specifically a multi-view image.

The target object may be an animal, a person, a landscape, or the like, and the image to be processed corresponds to a face image, a whole body image, a landscape image, or the like.

And 102, performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate.

In this embodiment, a Vertex Prediction network (VPNet) trained in advance is used to input an image to be processed into the Vertex Prediction network, and a Vertex Prediction coordinate set included in x, y, and z axes of a target object in the image to be processed is obtained.

By way of example and not limitation, the vertex prediction Network employed may be a Convolutional Neural Network model for object detection, such as RCNN (Region-based Convolutional Neural Networks), or Fast RCNN (Fast Region-based Convolutional Neural Networks) that performs performance enhancement based on RCNN, Fast RCNN (Fast Region-based Convolutional Neural Networks, Faster Region-based Convolutional Neural Networks), or the like. The area convolution neural network model generally comprises two modules, namely a detection module and a frame labeling module, wherein the detection module is used for performing frame labeling on a target object in an image to be processed; and the second processing module is used for performing convolution operation on the framed target object image in the first module to predict a three-dimensional model vertex prediction coordinate set of the target object image on x, y and z axes. The calculation formula is shown as formula (1):

wherein the content of the first and second substances,

representing a three-dimensional model prediction vertex coordinate set obtained by the prediction of a vertex prediction network VPNet, and processing an image I to be processed_iAnd inputting the vertex prediction network VPNet to perform vertex coordinate prediction to obtain the predicted vertex coordinates of the target object.

And 103, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram.

It should be understood that image frequency is a measure of how strongly the gray level of a pixel in an image varies, and the gray level is the brightness of each pixel in the image, and the range of gray levels is [0, 255], where white is 255 and black is 0. The image frequency reflects the difference degree between the gray value of the image pixel and the field point, wherein the gray value changes smoothly to be the image low-frequency information, and the change is violent to be the image low-frequency information. For example, an image containing a human face, the frequency of the whole image represents the contour of the human face and detail information, wherein low-frequency information is used for representing the whole contour of the human face, and high-frequency information is used for representing the detail wrinkles of the human face.

Further, to separate the high-frequency and medium-frequency detail image information from the low-frequency contour image information, assuming that R, G, B three-color values of image pixels have been extracted by the image rendering tool, where x is the gray scale value represented by the pixel, x is 0.2989 × R +0.5870 × G +0.1140 × B, and the gray scale value represented by a certain pixel can be obtained through this formula. Then, a certain threshold value is set, for example, the change range between the gray value and the domain point is set to be a low frequency between (0, 200), and the change range is set to be a high frequency between (200, 255), and the high frequency detail image information and the low frequency contour image information corresponding to the image to be processed are obtained through screening.

After the high-frequency detail image information and the low-frequency contour image information are obtained, the high-frequency detail image information and the low-frequency contour image information can be directly superposed to form an image key information characteristic diagram. The image high and low frequency information superposition calculation formula is shown as (2):

wherein the content of the first and second substances,

low-frequency contour image information and high-frequency detail image information respectively representing an image,

to represent

And forming an image key frequency information characteristic diagram by superposition.

In the specific implementation process, the high-frequency detail image information and the low-frequency contour image information can be processed respectively, the interference of noise in frequency features is eliminated, and the features of the high-frequency detail image information and the low-frequency contour image information are enhanced. And finally, overlapping the processed high-frequency detail image information and the low-frequency contour image information, and forming an integral key information characteristic diagram of the image after overlapping.

Specifically, as an optional implementation manner, the extracting of image frequency information based on an image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map includes:

acquiring a diffuse reflection map of the image to be processed based on the image to be processed; separating high-frequency and low-frequency information of the diffuse reflection mapping to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed; respectively carrying out regularization processing on the high-frequency detail image information and the low-frequency contour image information to obtain regularized high-frequency detail image information and regularized low-frequency contour image information; and overlapping the high-frequency detail image information and the low-frequency contour image information after regularization to form an image key frequency information characteristic diagram.

It should be understood that the diffuse reflection map contains the inherent color and texture information of the image, and since the texture information contained in the diffuse reflection map does not contain illumination and shadow occlusion, the influence of invalid information such as illumination and shadow occlusion existing in the image on the three-dimensional reconstruction effect is reduced to a certain extent, and the calculation formula of the extracted diffuse reflection map is shown in (3):

wherein

Representing an image I to be processed_iThe DiffNet represents a Diffuse reflection map network (DiffNet) that generates the Diffuse reflection map, and the Diffuse reflection map network may adopt an end-to-end supervised learning Neural network model, such as CNN (Convolutional Neural Networks) and the like.

It should also be understood that if the high and low frequency information of the image is processed simultaneously, the respective characteristic information of the high and low frequencies is damaged, and thus the image is blurred, so that the high and low frequency information of the image needs to be separated and then respectively subjected to a regularization operation to enhance the respective frequency characteristics of the high and low frequency information of the image. The calculation formula of the image high and low frequency information separation is shown as (4):

wherein the content of the first and second substances,

diffuse reflection maps respectively representing images to be processed

And decomposing the low-frequency contour image information and the high-frequency detail image information. FDAE represents a Frequency Decoupling algorithm (Frequency Decoupling Auto-Encoder).

By way of example and not limitation, the frequency decoupling algorithm may be a fourier transform or wavelet transform, or the like.

In particular, the wavelet toolkit in MATLAB that is involved in image decomposition and reconstruction can be used to separate high and low frequency information in image textures. The wavelet tool box specifically performs image frequency decoupling as follows: selecting a diffuse reflection map of an image to be processed, selecting a sym4 function in a wavelet transformation toolbox to preprocess the diffuse reflection map, namely, carrying out the steps of loading the image, displaying the image, converting the format and the like, then carrying out high-frequency and low-frequency information decomposition on the preprocessed image by calling a wavedec function in the wavelet transformation toolbox, respectively extracting high-frequency and low-frequency coefficients by using a detcoef function and an appcoef function during decomposition, respectively extracting high-frequency information and low-frequency information of the image according to the high-frequency and low-frequency coefficients by using the wrcoef function, respectively carrying out regularization processing on the high-frequency and low-frequency information by using an L1 norm in MATLAB, and finally superposing the generated high-frequency detail image information and the low-frequency contour image information. The image high-low frequency information superposition calculation formula is consistent with the formula (2), and is not described herein again.

And 104, displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.

It should be noted that, constructing the target three-dimensional model by directly predicting vertex coordinates does not show more details of the target object, and although we add vertices to the target three-dimensional model by direct tessellation, it does not add its details, but only adds more vertices belonging to the same plane as the original vertices. In order to increase the details of the target three-dimensional model, the position of the predicted vertex coordinates needs to be moved in some way, and therefore the predicted vertices can be adjusted by the frequency information included in the existing key information frequency characteristic map, and the pixel points of the image key frequency information characteristic map including the frequency information and the predicted vertex coordinates are in one-to-one correspondence relationship, so that the predicted vertex coordinates can be adjusted according to the image key frequency information characteristic map, and the direction and displacement amount for confirming the adjustment of the predicted vertex coordinates can be adjusted.

It should be understood that the image to be processed contains reflection information of light, and for a pixel point in the image, the illumination may generate a line perpendicular to a tangential plane of the pixel point, and the line has a direction because the reflection of light is irregular. And storing the direction value n of the line into the original positions of red, green and blue by using compressed x, y and z-axis coordinate values, wherein the direction value n formed by the compressed x, y and z-axis coordinate values is consistent with the number of vertexes of the target three-dimensional model and has a one-to-one correspondence relation, so that the adjustment direction can be confirmed for the predicted vertex coordinates by using the direction value n. Assuming that the x-axis coordinate value of a certain predicted vertex coordinate is a value in the range of [ -1, 1], the value of the direction value n compressed on the x-axis is obtained by a formula (0.5 x +0.5) × 255, and then the y-axis coordinate value and the z-axis coordinate value of the predicted vertex coordinate can also be compressed by the formula, the vector formed by the three values forms the direction value n, and the value of the predicted vertex after the adjustment direction can be obtained by multiplying the direction value n by the coordinate value of the corresponding predicted vertex x, y, z.

It should also be understood that the image frequency stored in the image key frequency information feature map is a measure of the intensity of the change of the gray level of the pixel in the image, and the intensity of the change can be used to describe the height of the protrusion and the depression of the object surface, which is defined as a height value h, and the height value can be a value stored by the pixel in the image using a color channel, where the height value h is in the range of [0, 255], white is represented by 0, and black is represented by 255, and here, a common gray level extraction formula based on human eye perception can be adopted by taking a weighted average of the RGB values of the pixel points: the height h ═ R0.2126 + G0.7152 + B0.0722, which is derived from the human eye's sensitivity to different colors.

Further, the predicted vertex coordinates are displaced by the obtained displacement direction of the predicted vertex coordinates and the displacement value in the displacement direction, and the calculation formula is shown as (5):

p′＝p+(h-1)n#(5)

wherein p is the current predicted vertex coordinate, p' is the corrected predicted vertex coordinate, n is the displacement direction value of the predicted vertex coordinate needing to be adjusted, and h is the height value of the adjusted predicted vertex coordinate displacement. Where the height value h e 0, 1, minus 1 for the height value h to have its interval 0, 1 become-1, 0, because the normal vector of the surface is usually facing the outside of the grid, which means that the outward deflection can be replaced by an inward deflection, because it is generally more convenient to spring the geometry in than to pull it out. In general, the corrected predicted vertex coordinates are obtained by calculating the sum of the current predicted vertex coordinates p and the product of the displacement direction value n and the displacement height value h.

Furthermore, the corrected predicted vertex coordinates are connected with adjacent predicted vertex coordinates to form a grid, and the grid connected with all the predicted vertex coordinates forms a target three-dimensional model corresponding to the target object.

Specifically, as an optional implementation manner, the method for obtaining a target three-dimensional model corresponding to a target object includes the steps of displacing a predicted vertex coordinate based on an image key frequency information feature map to obtain a corrected predicted vertex coordinate, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinate, where the steps include:

extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate; displacing the predicted vertex coordinate along the normal direction according to the obtained displacement to obtain a corrected predicted vertex coordinate; and constructing a target three-dimensional model corresponding to the target object based on the corrected predicted vertex coordinates.

It should be understood that the displacement map, also called a height map, can be used to describe the protrusions and depressions of the object surface, and the pixel points of the displacement map adopt a color channel to generate a height value h for each pixel; the pixel point of the normal map adopts three color channels to generate illumination for each pixel, and the illumination is corresponding to a direction value n vertical to a tangent plane of the pixel point, so that displacement optimization can be performed on the predicted vertex coordinate according to a certain displacement direction and displacement value by obtaining the displacement amount contained in the displacement map and the direction value in the normal map.

As an example and not by way of limitation, on one hand, a displacement map may be created by inputting an image to be processed of a target object through drawing software such as PhotoShop, and on the other hand, a displacement map of the image to be processed may be generated by inputting the image to be processed of the target object through a Neural Network model, for example, a Convolutional Neural Network (CNN), a Fast Region-based Convolutional Neural Network (Fast Convolutional Neural Network) with performance enhanced based on RCNN, or the like. The generated displacement map includes the displacement amount of the predicted vertex coordinate displaced along the normal direction thereof, and the predicted vertex coordinate is corrected by the displacement amount, and the calculation formula is shown as (6):

wherein Displacement (—) represents an operation that displaces a predicted vertex,

representing the generated three-dimensional model of the object,

to represent

The image key frequency information characteristic diagram formed by superposition,

representing the predicted vertex coordinates.

As an example and not by way of limitation, the normal map may be obtained by directly inputting a to-be-processed image by using a "3D" sub-menu of a "filter" tool in Photoshop, or by training a neural network such as a Generative Adaptive Network (GAN) to obtain the normal map of the image.

Specifically, the corrected predicted vertex coordinates are obtained by obtaining a height value n in the displacement map and a displacement direction value h of the normal map, and displacing the predicted vertex coordinates by a certain height value n according to the direction value h, wherein the calculation formula is consistent with the formula (5), and is not described herein again.

Further, as an example and not by way of limitation, an Open Graphics library (OpenGL) or Computer Aided Design (CAD) software may be adopted, the obtained predicted vertex coordinates are input, and then some other auxiliary information is set, so as to directly obtain a target three-dimensional model corresponding to the target object, which is not described herein again.

In the embodiment, firstly, an image to be processed of a target object is obtained, and then, the vertex of a three-dimensional model is predicted based on the image to be processed, so that the coordinate of a predicted vertex is obtained; simultaneously, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image key frequency information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the predicted vertex coordinates are adjusted through the frequency information contained in the key frequency information characteristic diagram extracted from the image to be processed, so that the detail and contour information of the generated target three-dimensional model are increased, and the accuracy and the integrity of the generated target three-dimensional model are ensured.

Different embodiments of the three-dimensional reconstruction method are also provided in the embodiments of the present application.

Referring to fig. 2, fig. 2 is a flowchart ii of a three-dimensional reconstruction method provided. As shown in fig. 2, a three-dimensional reconstruction method includes the following steps:

step 201, acquiring an image to be processed of a target object.

The implementation process of this step is the same as that of step 101 in the foregoing embodiment, and is not described here again.

Step 202, inputting the image to be processed into a vertex prediction network, and obtaining the initial prediction vertex coordinates of the image to be processed.

The implementation process of this step is the same as that of step 102 in the foregoing embodiment, and is not described here again.

And step 203, performing up-sampling on the primary prediction vertex coordinates, and performing down-sampling on the up-sampled primary prediction vertex coordinates.

It is to be understood that in the field of three-dimensional reconstruction, the upsampling in the processing of the three-dimensional model is based on the following idea: two points in the three-dimensional model vertex coordinates can determine a line, three points can determine a plane, three vertices of the three-dimensional model can form a triangular mesh, the basic processing unit of the three-dimensional model is the mesh, the mesh formed by the three vertices is assumed to exist at the moment, then vertex interpolation is carried out on a mesh area formed by the three vertices through up-sampling operation, more vertex numbers are increased, and at the moment, the triangular mesh is divided into a plurality of meshes formed by splitting the interpolation vertex numbers. In effect, the increased meshes can increase the details of the surface of the three-dimensional model, so that the model is smoother and more delicate. The upsampling operation can adopt modes such as bilinear interpolation, transposed convolution and the like, and the computation formula of the upsampling operation is shown as (7):

wherein

Representing the predicted vertex coordinates of the target object to be

Representing the set of predicted vertex coordinates that have undergone the upsampling operation.

Furthermore, the vertex coordinates after the upsampling operation are downsampled to keep the same dimension as the target object. The number of the vertexes is ensured to be consistent with the number of the vertexes of the prediction vertexes acquired through the vertex prediction network through downsampling, and downsampling operation can be performed on a vertex coordinate set in a random downsampling mode and the like so as to optimize the vertex prediction network through a subsequent first preset loss function.

And 204, combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the down-sampled preliminary prediction vertex coordinates by using a first preset loss function to obtain prediction vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the down-sampled preliminary prediction vertex coordinates.

As can be seen by combining the arrow information pointing to the "first preset loss function" in fig. 3, the calculation of the first preset loss function requires obtaining the preliminary predicted vertex coordinates after upsampling and the real vertex coordinates of the target object used for training in the vertex prediction network VPNet, the difference between the two is calculated by the first preset loss function, and the vertex prediction network parameters are continuously adjusted by the difference, so that the predicted vertex coordinate coefficients are as close as possible to the real vertex coordinate coefficients until the difference between the two is minimized.

By way of example and not limitation, the first predetermined loss function may be a mean-square error (MSE) function, and in addition, the first predetermined loss function may employ an average absolute error, an absolute value loss function, a logarithmic loss function, or the like, as desired. The first loss function is calculated using the mean square error as shown in (8):

where MSE is the mean square error, Downsampling denotes the down-sampling operation performed on the upsampled preliminary predicted vertex coordinates, V_iIn order to be the coordinates of the real vertices,

for the true vertex coordinate V_iAnd performing preliminary prediction of vertex coordinates after the up-sampling operation.

It should be understood that the mean square error MSE is the sum of squares of the difference between the predicted value of the parameter and the true value of the parameter, and then the mean square error MSE is averaged, and when the obtained value is smaller, it indicates that the current vertex prediction network model is more accurate, so that the parameters in the vertex prediction network model can be adjusted through predicting the vertex coordinates for many times and changing the degree of the predicted value, so as to obtain a more accurate vertex prediction effect.

Step 205, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;

the implementation process of this step is the same as the implementation process of step 103 in the foregoing embodiment, and is not described here again.

And step 206, displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.

The implementation process of this step is the same as that of step 104 in the foregoing embodiment, and is not described here again.

In the embodiment of the application, an image to be processed of a target object is obtained, the image to be processed is input to a vertex prediction network to obtain a preliminary prediction vertex coordinate, the preliminary prediction vertex coordinate is subjected to up-sampling and down-sampling, and then vertex prediction constraint is performed on the preliminary prediction vertex coordinate after down-sampling by using a preset loss function to obtain a prediction vertex coordinate; simultaneously, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image key frequency information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the predicted vertex coordinates are adjusted through the frequency information contained in the key frequency information characteristic diagram extracted from the image to be processed, the detail and contour information of the generated target three-dimensional model are increased, further, the predicted vertex coordinates are constrained by adopting a loss function, the accuracy of the predicted vertex coordinates is ensured, and the accuracy and the robustness of the generated target three-dimensional model are ensured on the basis of the constraint.

When the image overall key information feature map is obtained based on the diffuse reflection mapping of the image to be processed, with reference to fig. 4, the method further includes, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and three-dimensional reconstruction is performed based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object:

step 401, mapping the diffuse reflection map to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image.

It should be understood that the pixel points of the diffuse reflection map include the inherent color and texture of the image, and the pixel points on the two-dimensional plane map can be represented by u and v coordinates, wherein the horizontal direction is u and the vertical direction is v; and vertex coordinates x, y and z of the target three-dimensional model, wherein pixel points of the diffuse reflection map and vertexes of the target three-dimensional model are in one-to-one correspondence, and based on the vertex coordinates, positions including u and v coordinates can be pasted to the vertex coordinates of the corresponding target three-dimensional model, so that the target three-dimensional model can obtain the inherent color and texture in the diffuse reflection map.

Step 402, based on the two-dimensional image, combining the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; the second preset loss function is used for calculating a pixel difference value between the two-dimensional image and the image to be processed.

Specifically, as can be seen by combining the arrow information pointing to the "second preset loss function" in fig. 3, the second preset loss function calculates a corresponding two-dimensional image obtained by projecting the target three-dimensional model to the two-dimensional pixel space and the target object to-be-processed image I_iComparing pixel values of pixels at the same position, optimizing the target three-dimensional model by calculating the difference of the pixel values between the pixels, wherein the second preset loss function uses a calculation formula of mean square error as shown in (9):

wherein Render (, Render) represents a rendering operation, which is to Render a three-dimensional object into a corresponding two-dimensional image for processing,

in order to be a three-dimensional model of the object,

for diffuse reflection mapping of the image to be processed, I_iIs an image to be processed. Similarly, the second preset loss function may also adopt an average absolute error, an absolute value loss function, a logarithmic loss function, and the like as needed, and the first loss functions in synchronization step 204 are consistent and are not described herein again.

In this embodiment, a diffuse reflection map is mapped to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, the target three-dimensional model with inherent color and texture is projected to a two-dimensional pixel space to obtain a rendered two-dimensional image, and then, based on the two-dimensional image, in combination with a pixel value of the image to be predicted, a second preset loss function is used to perform pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; and the second preset loss function is used for calculating the pixel difference value between the two-dimensional image and the image to be processed. By calculating the pixel difference value between the images, the target three-dimensional model is optimized by using the difference value, and the accuracy and precision of the target three-dimensional model are improved.

Further, with reference to fig. 5, after the predicting vertex coordinates are shifted based on the image key frequency information feature map to obtain the corrected predicting vertex coordinates, and the three-dimensional reconstruction is performed based on the corrected predicting vertex coordinates to obtain the target three-dimensional model corresponding to the target object, the method may further include:

step 501, projecting the target three-dimensional model to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model.

It should be understood that the image frequency information contained in the corresponding two-dimensional image obtained by projecting the target three-dimensional model to which the diffuse reflection map is not applied to the two-dimensional pixel space is to avoid the influence of the information such as the inherent color and texture contained in the diffuse reflection map.

Step 502, respectively extracting image frequency information from the image to be processed and the two-dimensional image, referring to the image frequency information of the extracted image to be processed based on the image frequency information of the extracted two-dimensional image, and performing image frequency consistency constraint on the target three-dimensional model by using a third preset loss function to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed.

It should be understood that, as can be known from arrow information pointing to "a third preset loss function" in fig. 3, the third preset loss function needs to compare image frequency information included in a corresponding two-dimensional image obtained by projecting a target three-dimensional model to a two-dimensional pixel space with image frequency information included in preprocessing of a target object, and calculate an image frequency difference value between the two-dimensional image and an image to be processed to optimize the target three-dimensional model, and the third preset loss function uses a calculation formula of a mean square error function as shown in (10):

wherein Extract (. sup.) represents extraction

The obtained two-dimensional image frequency information, Extract (I)_i) Is the image frequency information that the pre-processed image of the target object has,

a two-dimensional image acquired by projecting the target three-dimensional model into a two-dimensional space. Similarly, the third preset loss function may also adopt an average absolute error, an absolute value loss function, a logarithmic loss function, and the like as needed, and the first loss functions in synchronization step 204 are consistent and are not described herein again.

In this embodiment, a target three-dimensional model is projected to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model, and based on image frequency information of the extracted two-dimensional image, image frequency consistency constraint is performed on the target three-dimensional model by using a third preset loss function with reference to the image frequency information of the extracted image to be processed to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed, and optimizing the target three-dimensional model by using the difference value, so that the accuracy and precision of the target three-dimensional model are improved.

Further, in order to realize better three-dimensional reconstruction, an overall optimization loss function of the three-dimensional reconstruction method is designed according to the three preset loss functions:

specifically, a first preset loss function is set to have a first weight, a second preset loss function is set to have a second weight, and a third preset function is set to have a third weight, wherein the first weight, the second weight, and the third weight are all adjustable weights. Specifically, as shown in formula (11):

wherein, Loss represents the overall optimization objective,

weights of the first preset loss function, the second preset loss function and the third preset loss function are respectively expressed, and the three-dimensional reconstruction model can have different reconstruction effects through adjustment of the weights. For example, by adjusting a first predetermined loss function

After weighting, more accurate predicted vertex coordinates are formed, and the second preset loss function is adjusted

After weighting, the three-dimensional reconstruction model rendering image can be optimized, and the third preset loss function is adjusted

After weighting, the detail information of the three-dimensional reconstruction model can be more prominent, and a better three-dimensional reconstruction effect can be obtained by adjusting the weight values of the three loss functions.

Referring to fig. 6, fig. 6 is a structural diagram of a terminal provided in an embodiment of the present application, and for convenience of description, only a part related to the embodiment of the present application is shown.

The three-dimensional reconstruction apparatus 600 includes:

an obtaining module 601, configured to obtain an image to be processed of a target object;

the prediction module 602 is configured to perform three-dimensional model vertex prediction based on an image to be processed to obtain a prediction vertex coordinate;

the extraction module 603 is configured to extract image frequency information based on the image to be processed, obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimpose the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map;

and the three-dimensional reconstruction module 604 is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.

The prediction module 602 is specifically configured to:

inputting an image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;

up-sampling the primary prediction vertex coordinates, and down-sampling the up-sampled primary prediction vertex coordinates;

the extracting module 603 is specifically configured to:

acquiring a diffuse reflection map of the image to be processed based on the image to be processed;

separating high-frequency and low-frequency information of the diffuse reflection mapping to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed;

respectively carrying out regularization processing on the high-frequency detail image information and the low-frequency contour image information to obtain regularized high-frequency detail image information and regularized low-frequency contour image information;

and superposing the high-frequency detail image information and the low-frequency outline image information to form an integral key frequency information characteristic diagram.

Correspondingly, the extracting module 603 is further configured to:

mapping the diffuse reflection map to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image;

based on the two-dimensional image, combining the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; and the second preset loss function is used for calculating the pixel difference value between the two-dimensional image and the image to be processed.

The three-dimensional reconstruction module 604 is specifically configured to:

extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate;

displacing the three-dimensional vertex coordinate along the normal direction according to the obtained displacement amount to obtain a corrected three-dimensional vertex coordinate;

and constructing a target three-dimensional model corresponding to the target object based on the corrected three-dimensional vertex coordinates.

And combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the down-sampled preliminary prediction vertex coordinates by using a first preset loss function to obtain prediction vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the down-sampled preliminary prediction vertex coordinates.

Correspondingly, the three-dimensional reconstruction module 604 is further configured to:

projecting the target three-dimensional model to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model;

respectively extracting image frequency information from the image to be processed and the two-dimensional image, referring to the extracted image frequency information of the image to be processed based on the extracted image frequency information of the two-dimensional image, and performing image frequency consistency constraint on the target three-dimensional model by using a third preset loss function to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed.

The three-dimensional reconstruction device provided by the embodiment of the application can realize each process of the three-dimensional reconstruction embodiment, can achieve the same technical effect, and is not repeated here for avoiding repetition.

Fig. 7 is a block diagram of a terminal according to an embodiment of the present application. As shown in the figure, the terminal 7 of this embodiment includes: at least one processor 70 (only one shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in any of the various method embodiments described above when executing the computer program 72.

The terminal 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal 7 may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is only an example of a terminal 7 and does not constitute a limitation of the terminal 7, and that it may comprise more or less components than those shown, or some components may be combined, or different components, for example the terminal may further comprise input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal 7, such as a hard disk or a memory of the terminal 7. The memory 71 may also be an external storage device of the terminal 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like provided on the terminal 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the embodiments of the methods described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The present application realizes all or part of the processes in the method of the above embodiments, and may also be implemented by a computer program product, when the computer program product runs on a terminal, the steps in the above method embodiments may be implemented when the terminal executes the computer program product.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of three-dimensional reconstruction, comprising:

acquiring an image to be processed of a target object;

2. The three-dimensional reconstruction method according to claim 1, wherein performing three-dimensional model vertex prediction based on the image to be processed to obtain predicted vertex coordinates comprises:

inputting the image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;

up-sampling the preliminary prediction vertex coordinates, and down-sampling the up-sampled preliminary prediction vertex coordinates;

and combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the preliminarily predicted vertex coordinates after down-sampling by utilizing a first preset loss function to obtain the predicted vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the preliminarily predicted vertex coordinates after down-sampling.

3. The three-dimensional reconstruction method according to claim 1, wherein extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map comprises:

separating high-frequency and low-frequency information of the diffuse reflection map to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed;

regularization processing is respectively carried out on the high-frequency detail image information and the low-frequency contour image information, and the regularized high-frequency detail image information and low-frequency contour image information are obtained;

and superposing the high-frequency detailed image information and the low-frequency contour image information after regularization to form the image key frequency information characteristic diagram.

4. The three-dimensional reconstruction method according to claim 3, wherein, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and the three-dimensional reconstruction is performed based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object, the method further comprises:

mapping the diffuse reflection map to the target three-dimensional model to obtain the target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image;

based on the two-dimensional image, combining with the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain the optimized target three-dimensional model; the second preset loss function is used for calculating a pixel difference value between the two-dimensional image and the image to be processed.

5. The three-dimensional reconstruction method according to claim 1, wherein the shifting the prediction vertex coordinates based on the image key frequency information feature map to obtain corrected prediction vertex coordinates, and performing three-dimensional reconstruction based on the corrected prediction vertex coordinates to obtain a target three-dimensional model corresponding to the target object includes:

displacing the predicted vertex coordinate along the normal direction according to the obtained displacement to obtain a corrected predicted vertex coordinate;

and constructing a target three-dimensional model corresponding to the target object based on the corrected predicted vertex coordinates.

6. The three-dimensional reconstruction method according to claim 1, wherein, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and a target three-dimensional model corresponding to the target object is obtained based on the corrected predicted vertex coordinates by performing three-dimensional reconstruction, the method further comprises:

7. A three-dimensional reconstruction apparatus, comprising:

8. The three-dimensional reconstruction apparatus of claim 7, wherein the prediction module is specifically configured to:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and being operable on the processor to perform a method for three-dimensional reconstruction, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.