CN114373056A - Three-dimensional reconstruction method and device, terminal equipment and storage medium - Google Patents

Three-dimensional reconstruction method and device, terminal equipment and storage medium Download PDF

Info

Publication number
CN114373056A
CN114373056A CN202111553514.6A CN202111553514A CN114373056A CN 114373056 A CN114373056 A CN 114373056A CN 202111553514 A CN202111553514 A CN 202111553514A CN 114373056 A CN114373056 A CN 114373056A
Authority
CN
China
Prior art keywords
image
frequency
processed
dimensional
vertex coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111553514.6A
Other languages
Chinese (zh)
Inventor
陶大鹏
石宇航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan United Visual Technology Co ltd
Original Assignee
Yunnan United Visual Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan United Visual Technology Co ltd filed Critical Yunnan United Visual Technology Co ltd
Priority to CN202111553514.6A priority Critical patent/CN114373056A/en
Publication of CN114373056A publication Critical patent/CN114373056A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application is applicable to the field of three-dimensional reconstruction, and provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, a terminal device and a storage medium, wherein the method comprises the following steps: acquiring an image to be processed of a target object; performing three-dimensional model vertex prediction on the image to be processed to obtain a predicted vertex coordinate; acquiring a diffuse reflection map of the image to be processed, separating high-frequency and low-frequency information of the image contained in the diffuse reflection map, processing the high-frequency and low-frequency information respectively, and then superposing the high-frequency and low-frequency information to form an image key frequency information characteristic map; using the image key frequency information characteristic diagram to correct the predicted vertex coordinates, and constructing a three-dimensional reconstruction model of the target object based on the corrected predicted vertex coordinates; to further improve the realism, the three-dimensional reconstruction model is optimized using a loss function. According to the method and the device, the image details and the contour information are deeply mined, the three-dimensional reconstruction result is finely adjusted, the efficiency and the quality of the three-dimensional reconstruction model are balanced, and the precision and the robustness of the three-dimensional reconstruction can be improved.

Description

Three-dimensional reconstruction method and device, terminal equipment and storage medium
Technical Field
The present application belongs to the field of three-dimensional reconstruction technologies, and in particular, to a three-dimensional reconstruction method, an apparatus, a terminal device, and a storage medium.
Background
With the development of deep learning technology, the three-dimensional model of the image inference object has become a research hotspot and is widely concerned by researchers. The three-dimensional reconstruction model based on the image reconstruction target can well acquire low-frequency information of the surface contour of the object, and the three-dimensional reconstruction model is relatively credible by using a normal map and a light and shadow effect. However, this method ignores the existence of surface high-frequency detail information, and cannot form a high-precision three-dimensional reconstruction model. At present, methods for obtaining high-precision models can be mainly classified into three major categories: (1) is manually made by experienced art designer modelers. (2) Scan capture reconstruction is performed using specialized equipment. (3) And establishing a connection from the two-dimensional image to the three-dimensional reconstruction model based on a deep learning method. Generally, the manual method can build a very perfect model, but it is time-consuming and inefficient. The reconstruction by using professional equipment has the problems of high expense and difficult popularization. For the two methods, the method based on deep learning obviously has stronger semantic representation capability, but the existing method based on image reconstruction target three-dimensional reconstruction model needs to depend on a large number of training samples to limit the application range, and the algorithm can not effectively realize a high-precision model due to the omission of texture high-frequency detail information.
Generally speaking, the traditional method for capturing a three-dimensional model by manual manufacturing or scanning of professional equipment is time-consuming and labor-consuming, and the application range and the capability of the algorithm of the existing method for reconstructing the target three-dimensional model based on the image are limited, so that the efficient and accurate three-dimensional reconstruction is difficult to realize by the existing method for reconstructing the target three-dimensional model based on the image.
Disclosure of Invention
The embodiment of the application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, terminal equipment and a storage medium, and aims to solve the problem that in the prior art, the efficiency and the accuracy of an image reconstruction object three-dimensional model method are not high.
The embodiment of the application provides a three-dimensional reconstruction method, which comprises the following steps:
acquiring an image to be processed of a target object;
performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate;
extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
A second aspect of the embodiments of the present application provides a device of a three-dimensional reconstruction method, including:
the acquisition module is used for acquiring an image to be processed of a target object;
the prediction module is used for carrying out three-dimensional model vertex prediction based on the image to be processed to obtain a prediction vertex coordinate;
the extraction module is used for extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and the three-dimensional reconstruction module is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
A third aspect of embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.
A fifth aspect of the present application provides a computer program product, which, when run on a terminal, causes the terminal to perform the steps of the method of the first aspect described above.
As can be seen from the above, in this embodiment, an image to be processed of a target object is first obtained, and then, three-dimensional model vertex prediction is performed on the image to obtain a predicted vertex coordinate; simultaneously, extracting image frequency information of the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the prediction vertex is adjusted through the frequency information contained in the key information frequency characteristic diagram extracted from the image to be processed, and the adjusted prediction vertex coordinates are used for increasing the details and the contour information of the target three-dimensional model, so that the accuracy and the integrity of the generated target three-dimensional model are ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a first flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application;
fig. 2 is a flowchart ii of a three-dimensional reconstruction method provided in an embodiment of the present application;
fig. 3 is a network structure diagram of a three-dimensional reconstruction method according to an embodiment of the present application;
fig. 4 is a flowchart three of a three-dimensional reconstruction method provided in the embodiment of the present application;
fig. 5 is a fourth flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application;
fig. 6 is a structural diagram of a three-dimensional reconstruction apparatus according to an embodiment of the present application;
fig. 7 is a block diagram of a terminal according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In particular implementations, the terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).
In the discussion that follows, a terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
The terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.
Various applications that may be executed on the terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.
It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
Referring to fig. 1, fig. 1 is a first flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application. As shown in fig. 1, a three-dimensional reconstruction method includes the following steps:
step 101, acquiring an image to be processed of a target object.
In some embodiments, the image to be processed is an image obtained by shooting a target object, or an image frame captured from an existing video containing the target object.
Specifically, when the target object is photographed to obtain the image to be processed, the image to be processed may be realized by using a monocular camera, and at this time, the image to be processed is specifically a monocular image, or the image to be processed is realized by using a multi-view camera, and at this time, the image to be processed is specifically a multi-view image.
The target object may be an animal, a person, a landscape, or the like, and the image to be processed corresponds to a face image, a whole body image, a landscape image, or the like.
And 102, performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate.
In this embodiment, a Vertex Prediction network (VPNet) trained in advance is used to input an image to be processed into the Vertex Prediction network, and a Vertex Prediction coordinate set included in x, y, and z axes of a target object in the image to be processed is obtained.
By way of example and not limitation, the vertex prediction Network employed may be a Convolutional Neural Network model for object detection, such as RCNN (Region-based Convolutional Neural Networks), or Fast RCNN (Fast Region-based Convolutional Neural Networks) that performs performance enhancement based on RCNN, Fast RCNN (Fast Region-based Convolutional Neural Networks, Faster Region-based Convolutional Neural Networks), or the like. The area convolution neural network model generally comprises two modules, namely a detection module and a frame labeling module, wherein the detection module is used for performing frame labeling on a target object in an image to be processed; and the second processing module is used for performing convolution operation on the framed target object image in the first module to predict a three-dimensional model vertex prediction coordinate set of the target object image on x, y and z axes. The calculation formula is shown as formula (1):
Figure BDA0003417848050000061
wherein the content of the first and second substances,
Figure BDA0003417848050000062
representing a three-dimensional model prediction vertex coordinate set obtained by the prediction of a vertex prediction network VPNet, and processing an image I to be processediAnd inputting the vertex prediction network VPNet to perform vertex coordinate prediction to obtain the predicted vertex coordinates of the target object.
And 103, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram.
It should be understood that image frequency is a measure of how strongly the gray level of a pixel in an image varies, and the gray level is the brightness of each pixel in the image, and the range of gray levels is [0, 255], where white is 255 and black is 0. The image frequency reflects the difference degree between the gray value of the image pixel and the field point, wherein the gray value changes smoothly to be the image low-frequency information, and the change is violent to be the image low-frequency information. For example, an image containing a human face, the frequency of the whole image represents the contour of the human face and detail information, wherein low-frequency information is used for representing the whole contour of the human face, and high-frequency information is used for representing the detail wrinkles of the human face.
Further, to separate the high-frequency and medium-frequency detail image information from the low-frequency contour image information, assuming that R, G, B three-color values of image pixels have been extracted by the image rendering tool, where x is the gray scale value represented by the pixel, x is 0.2989 × R +0.5870 × G +0.1140 × B, and the gray scale value represented by a certain pixel can be obtained through this formula. Then, a certain threshold value is set, for example, the change range between the gray value and the domain point is set to be a low frequency between (0, 200), and the change range is set to be a high frequency between (200, 255), and the high frequency detail image information and the low frequency contour image information corresponding to the image to be processed are obtained through screening.
After the high-frequency detail image information and the low-frequency contour image information are obtained, the high-frequency detail image information and the low-frequency contour image information can be directly superposed to form an image key information characteristic diagram. The image high and low frequency information superposition calculation formula is shown as (2):
Figure BDA0003417848050000071
wherein the content of the first and second substances,
Figure BDA0003417848050000072
low-frequency contour image information and high-frequency detail image information respectively representing an image,
Figure BDA0003417848050000073
to represent
Figure BDA0003417848050000074
And forming an image key frequency information characteristic diagram by superposition.
In the specific implementation process, the high-frequency detail image information and the low-frequency contour image information can be processed respectively, the interference of noise in frequency features is eliminated, and the features of the high-frequency detail image information and the low-frequency contour image information are enhanced. And finally, overlapping the processed high-frequency detail image information and the low-frequency contour image information, and forming an integral key information characteristic diagram of the image after overlapping.
Specifically, as an optional implementation manner, the extracting of image frequency information based on an image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map includes:
acquiring a diffuse reflection map of the image to be processed based on the image to be processed; separating high-frequency and low-frequency information of the diffuse reflection mapping to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed; respectively carrying out regularization processing on the high-frequency detail image information and the low-frequency contour image information to obtain regularized high-frequency detail image information and regularized low-frequency contour image information; and overlapping the high-frequency detail image information and the low-frequency contour image information after regularization to form an image key frequency information characteristic diagram.
It should be understood that the diffuse reflection map contains the inherent color and texture information of the image, and since the texture information contained in the diffuse reflection map does not contain illumination and shadow occlusion, the influence of invalid information such as illumination and shadow occlusion existing in the image on the three-dimensional reconstruction effect is reduced to a certain extent, and the calculation formula of the extracted diffuse reflection map is shown in (3):
Figure BDA0003417848050000081
wherein
Figure BDA0003417848050000082
Representing an image I to be processediThe DiffNet represents a Diffuse reflection map network (DiffNet) that generates the Diffuse reflection map, and the Diffuse reflection map network may adopt an end-to-end supervised learning Neural network model, such as CNN (Convolutional Neural Networks) and the like.
It should also be understood that if the high and low frequency information of the image is processed simultaneously, the respective characteristic information of the high and low frequencies is damaged, and thus the image is blurred, so that the high and low frequency information of the image needs to be separated and then respectively subjected to a regularization operation to enhance the respective frequency characteristics of the high and low frequency information of the image. The calculation formula of the image high and low frequency information separation is shown as (4):
Figure BDA0003417848050000083
wherein the content of the first and second substances,
Figure BDA0003417848050000084
diffuse reflection maps respectively representing images to be processed
Figure BDA0003417848050000085
And decomposing the low-frequency contour image information and the high-frequency detail image information. FDAE represents a Frequency Decoupling algorithm (Frequency Decoupling Auto-Encoder).
By way of example and not limitation, the frequency decoupling algorithm may be a fourier transform or wavelet transform, or the like.
In particular, the wavelet toolkit in MATLAB that is involved in image decomposition and reconstruction can be used to separate high and low frequency information in image textures. The wavelet tool box specifically performs image frequency decoupling as follows: selecting a diffuse reflection map of an image to be processed, selecting a sym4 function in a wavelet transformation toolbox to preprocess the diffuse reflection map, namely, carrying out the steps of loading the image, displaying the image, converting the format and the like, then carrying out high-frequency and low-frequency information decomposition on the preprocessed image by calling a wavedec function in the wavelet transformation toolbox, respectively extracting high-frequency and low-frequency coefficients by using a detcoef function and an appcoef function during decomposition, respectively extracting high-frequency information and low-frequency information of the image according to the high-frequency and low-frequency coefficients by using the wrcoef function, respectively carrying out regularization processing on the high-frequency and low-frequency information by using an L1 norm in MATLAB, and finally superposing the generated high-frequency detail image information and the low-frequency contour image information. The image high-low frequency information superposition calculation formula is consistent with the formula (2), and is not described herein again.
And 104, displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
It should be noted that, constructing the target three-dimensional model by directly predicting vertex coordinates does not show more details of the target object, and although we add vertices to the target three-dimensional model by direct tessellation, it does not add its details, but only adds more vertices belonging to the same plane as the original vertices. In order to increase the details of the target three-dimensional model, the position of the predicted vertex coordinates needs to be moved in some way, and therefore the predicted vertices can be adjusted by the frequency information included in the existing key information frequency characteristic map, and the pixel points of the image key frequency information characteristic map including the frequency information and the predicted vertex coordinates are in one-to-one correspondence relationship, so that the predicted vertex coordinates can be adjusted according to the image key frequency information characteristic map, and the direction and displacement amount for confirming the adjustment of the predicted vertex coordinates can be adjusted.
It should be understood that the image to be processed contains reflection information of light, and for a pixel point in the image, the illumination may generate a line perpendicular to a tangential plane of the pixel point, and the line has a direction because the reflection of light is irregular. And storing the direction value n of the line into the original positions of red, green and blue by using compressed x, y and z-axis coordinate values, wherein the direction value n formed by the compressed x, y and z-axis coordinate values is consistent with the number of vertexes of the target three-dimensional model and has a one-to-one correspondence relation, so that the adjustment direction can be confirmed for the predicted vertex coordinates by using the direction value n. Assuming that the x-axis coordinate value of a certain predicted vertex coordinate is a value in the range of [ -1, 1], the value of the direction value n compressed on the x-axis is obtained by a formula (0.5 x +0.5) × 255, and then the y-axis coordinate value and the z-axis coordinate value of the predicted vertex coordinate can also be compressed by the formula, the vector formed by the three values forms the direction value n, and the value of the predicted vertex after the adjustment direction can be obtained by multiplying the direction value n by the coordinate value of the corresponding predicted vertex x, y, z.
It should also be understood that the image frequency stored in the image key frequency information feature map is a measure of the intensity of the change of the gray level of the pixel in the image, and the intensity of the change can be used to describe the height of the protrusion and the depression of the object surface, which is defined as a height value h, and the height value can be a value stored by the pixel in the image using a color channel, where the height value h is in the range of [0, 255], white is represented by 0, and black is represented by 255, and here, a common gray level extraction formula based on human eye perception can be adopted by taking a weighted average of the RGB values of the pixel points: the height h ═ R0.2126 + G0.7152 + B0.0722, which is derived from the human eye's sensitivity to different colors.
Further, the predicted vertex coordinates are displaced by the obtained displacement direction of the predicted vertex coordinates and the displacement value in the displacement direction, and the calculation formula is shown as (5):
p′=p+(h-1)n#(5)
wherein p is the current predicted vertex coordinate, p' is the corrected predicted vertex coordinate, n is the displacement direction value of the predicted vertex coordinate needing to be adjusted, and h is the height value of the adjusted predicted vertex coordinate displacement. Where the height value h e 0, 1, minus 1 for the height value h to have its interval 0, 1 become-1, 0, because the normal vector of the surface is usually facing the outside of the grid, which means that the outward deflection can be replaced by an inward deflection, because it is generally more convenient to spring the geometry in than to pull it out. In general, the corrected predicted vertex coordinates are obtained by calculating the sum of the current predicted vertex coordinates p and the product of the displacement direction value n and the displacement height value h.
Furthermore, the corrected predicted vertex coordinates are connected with adjacent predicted vertex coordinates to form a grid, and the grid connected with all the predicted vertex coordinates forms a target three-dimensional model corresponding to the target object.
Specifically, as an optional implementation manner, the method for obtaining a target three-dimensional model corresponding to a target object includes the steps of displacing a predicted vertex coordinate based on an image key frequency information feature map to obtain a corrected predicted vertex coordinate, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinate, where the steps include:
extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate; displacing the predicted vertex coordinate along the normal direction according to the obtained displacement to obtain a corrected predicted vertex coordinate; and constructing a target three-dimensional model corresponding to the target object based on the corrected predicted vertex coordinates.
It should be understood that the displacement map, also called a height map, can be used to describe the protrusions and depressions of the object surface, and the pixel points of the displacement map adopt a color channel to generate a height value h for each pixel; the pixel point of the normal map adopts three color channels to generate illumination for each pixel, and the illumination is corresponding to a direction value n vertical to a tangent plane of the pixel point, so that displacement optimization can be performed on the predicted vertex coordinate according to a certain displacement direction and displacement value by obtaining the displacement amount contained in the displacement map and the direction value in the normal map.
As an example and not by way of limitation, on one hand, a displacement map may be created by inputting an image to be processed of a target object through drawing software such as PhotoShop, and on the other hand, a displacement map of the image to be processed may be generated by inputting the image to be processed of the target object through a Neural Network model, for example, a Convolutional Neural Network (CNN), a Fast Region-based Convolutional Neural Network (Fast Convolutional Neural Network) with performance enhanced based on RCNN, or the like. The generated displacement map includes the displacement amount of the predicted vertex coordinate displaced along the normal direction thereof, and the predicted vertex coordinate is corrected by the displacement amount, and the calculation formula is shown as (6):
Figure BDA0003417848050000111
wherein Displacement (—) represents an operation that displaces a predicted vertex,
Figure BDA0003417848050000112
representing the generated three-dimensional model of the object,
Figure BDA0003417848050000113
to represent
Figure BDA0003417848050000114
The image key frequency information characteristic diagram formed by superposition,
Figure BDA0003417848050000115
representing the predicted vertex coordinates.
As an example and not by way of limitation, the normal map may be obtained by directly inputting a to-be-processed image by using a "3D" sub-menu of a "filter" tool in Photoshop, or by training a neural network such as a Generative Adaptive Network (GAN) to obtain the normal map of the image.
Specifically, the corrected predicted vertex coordinates are obtained by obtaining a height value n in the displacement map and a displacement direction value h of the normal map, and displacing the predicted vertex coordinates by a certain height value n according to the direction value h, wherein the calculation formula is consistent with the formula (5), and is not described herein again.
Further, as an example and not by way of limitation, an Open Graphics library (OpenGL) or Computer Aided Design (CAD) software may be adopted, the obtained predicted vertex coordinates are input, and then some other auxiliary information is set, so as to directly obtain a target three-dimensional model corresponding to the target object, which is not described herein again.
In the embodiment, firstly, an image to be processed of a target object is obtained, and then, the vertex of a three-dimensional model is predicted based on the image to be processed, so that the coordinate of a predicted vertex is obtained; simultaneously, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image key frequency information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the predicted vertex coordinates are adjusted through the frequency information contained in the key frequency information characteristic diagram extracted from the image to be processed, so that the detail and contour information of the generated target three-dimensional model are increased, and the accuracy and the integrity of the generated target three-dimensional model are ensured.
Different embodiments of the three-dimensional reconstruction method are also provided in the embodiments of the present application.
Referring to fig. 2, fig. 2 is a flowchart ii of a three-dimensional reconstruction method provided. As shown in fig. 2, a three-dimensional reconstruction method includes the following steps:
step 201, acquiring an image to be processed of a target object.
The implementation process of this step is the same as that of step 101 in the foregoing embodiment, and is not described here again.
Step 202, inputting the image to be processed into a vertex prediction network, and obtaining the initial prediction vertex coordinates of the image to be processed.
The implementation process of this step is the same as that of step 102 in the foregoing embodiment, and is not described here again.
And step 203, performing up-sampling on the primary prediction vertex coordinates, and performing down-sampling on the up-sampled primary prediction vertex coordinates.
It is to be understood that in the field of three-dimensional reconstruction, the upsampling in the processing of the three-dimensional model is based on the following idea: two points in the three-dimensional model vertex coordinates can determine a line, three points can determine a plane, three vertices of the three-dimensional model can form a triangular mesh, the basic processing unit of the three-dimensional model is the mesh, the mesh formed by the three vertices is assumed to exist at the moment, then vertex interpolation is carried out on a mesh area formed by the three vertices through up-sampling operation, more vertex numbers are increased, and at the moment, the triangular mesh is divided into a plurality of meshes formed by splitting the interpolation vertex numbers. In effect, the increased meshes can increase the details of the surface of the three-dimensional model, so that the model is smoother and more delicate. The upsampling operation can adopt modes such as bilinear interpolation, transposed convolution and the like, and the computation formula of the upsampling operation is shown as (7):
Figure BDA0003417848050000131
wherein
Figure BDA0003417848050000132
Representing the predicted vertex coordinates of the target object to be
Figure BDA0003417848050000133
Figure BDA0003417848050000134
Representing the set of predicted vertex coordinates that have undergone the upsampling operation.
Furthermore, the vertex coordinates after the upsampling operation are downsampled to keep the same dimension as the target object. The number of the vertexes is ensured to be consistent with the number of the vertexes of the prediction vertexes acquired through the vertex prediction network through downsampling, and downsampling operation can be performed on a vertex coordinate set in a random downsampling mode and the like so as to optimize the vertex prediction network through a subsequent first preset loss function.
And 204, combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the down-sampled preliminary prediction vertex coordinates by using a first preset loss function to obtain prediction vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the down-sampled preliminary prediction vertex coordinates.
As can be seen by combining the arrow information pointing to the "first preset loss function" in fig. 3, the calculation of the first preset loss function requires obtaining the preliminary predicted vertex coordinates after upsampling and the real vertex coordinates of the target object used for training in the vertex prediction network VPNet, the difference between the two is calculated by the first preset loss function, and the vertex prediction network parameters are continuously adjusted by the difference, so that the predicted vertex coordinate coefficients are as close as possible to the real vertex coordinate coefficients until the difference between the two is minimized.
By way of example and not limitation, the first predetermined loss function may be a mean-square error (MSE) function, and in addition, the first predetermined loss function may employ an average absolute error, an absolute value loss function, a logarithmic loss function, or the like, as desired. The first loss function is calculated using the mean square error as shown in (8):
Figure BDA0003417848050000135
where MSE is the mean square error, Downsampling denotes the down-sampling operation performed on the upsampled preliminary predicted vertex coordinates, ViIn order to be the coordinates of the real vertices,
Figure BDA0003417848050000136
for the true vertex coordinate ViAnd performing preliminary prediction of vertex coordinates after the up-sampling operation.
It should be understood that the mean square error MSE is the sum of squares of the difference between the predicted value of the parameter and the true value of the parameter, and then the mean square error MSE is averaged, and when the obtained value is smaller, it indicates that the current vertex prediction network model is more accurate, so that the parameters in the vertex prediction network model can be adjusted through predicting the vertex coordinates for many times and changing the degree of the predicted value, so as to obtain a more accurate vertex prediction effect.
Step 205, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
the implementation process of this step is the same as the implementation process of step 103 in the foregoing embodiment, and is not described here again.
And step 206, displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
The implementation process of this step is the same as that of step 104 in the foregoing embodiment, and is not described here again.
In the embodiment of the application, an image to be processed of a target object is obtained, the image to be processed is input to a vertex prediction network to obtain a preliminary prediction vertex coordinate, the preliminary prediction vertex coordinate is subjected to up-sampling and down-sampling, and then vertex prediction constraint is performed on the preliminary prediction vertex coordinate after down-sampling by using a preset loss function to obtain a prediction vertex coordinate; simultaneously, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image key frequency information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the predicted vertex coordinates are adjusted through the frequency information contained in the key frequency information characteristic diagram extracted from the image to be processed, the detail and contour information of the generated target three-dimensional model are increased, further, the predicted vertex coordinates are constrained by adopting a loss function, the accuracy of the predicted vertex coordinates is ensured, and the accuracy and the robustness of the generated target three-dimensional model are ensured on the basis of the constraint.
When the image overall key information feature map is obtained based on the diffuse reflection mapping of the image to be processed, with reference to fig. 4, the method further includes, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and three-dimensional reconstruction is performed based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object:
step 401, mapping the diffuse reflection map to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image.
It should be understood that the pixel points of the diffuse reflection map include the inherent color and texture of the image, and the pixel points on the two-dimensional plane map can be represented by u and v coordinates, wherein the horizontal direction is u and the vertical direction is v; and vertex coordinates x, y and z of the target three-dimensional model, wherein pixel points of the diffuse reflection map and vertexes of the target three-dimensional model are in one-to-one correspondence, and based on the vertex coordinates, positions including u and v coordinates can be pasted to the vertex coordinates of the corresponding target three-dimensional model, so that the target three-dimensional model can obtain the inherent color and texture in the diffuse reflection map.
Step 402, based on the two-dimensional image, combining the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; the second preset loss function is used for calculating a pixel difference value between the two-dimensional image and the image to be processed.
Specifically, as can be seen by combining the arrow information pointing to the "second preset loss function" in fig. 3, the second preset loss function calculates a corresponding two-dimensional image obtained by projecting the target three-dimensional model to the two-dimensional pixel space and the target object to-be-processed image IiComparing pixel values of pixels at the same position, optimizing the target three-dimensional model by calculating the difference of the pixel values between the pixels, wherein the second preset loss function uses a calculation formula of mean square error as shown in (9):
Figure BDA0003417848050000151
wherein Render (, Render) represents a rendering operation, which is to Render a three-dimensional object into a corresponding two-dimensional image for processing,
Figure BDA0003417848050000152
in order to be a three-dimensional model of the object,
Figure BDA0003417848050000153
for diffuse reflection mapping of the image to be processed, IiIs an image to be processed. Similarly, the second preset loss function may also adopt an average absolute error, an absolute value loss function, a logarithmic loss function, and the like as needed, and the first loss functions in synchronization step 204 are consistent and are not described herein again.
In this embodiment, a diffuse reflection map is mapped to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, the target three-dimensional model with inherent color and texture is projected to a two-dimensional pixel space to obtain a rendered two-dimensional image, and then, based on the two-dimensional image, in combination with a pixel value of the image to be predicted, a second preset loss function is used to perform pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; and the second preset loss function is used for calculating the pixel difference value between the two-dimensional image and the image to be processed. By calculating the pixel difference value between the images, the target three-dimensional model is optimized by using the difference value, and the accuracy and precision of the target three-dimensional model are improved.
Further, with reference to fig. 5, after the predicting vertex coordinates are shifted based on the image key frequency information feature map to obtain the corrected predicting vertex coordinates, and the three-dimensional reconstruction is performed based on the corrected predicting vertex coordinates to obtain the target three-dimensional model corresponding to the target object, the method may further include:
step 501, projecting the target three-dimensional model to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model.
It should be understood that the image frequency information contained in the corresponding two-dimensional image obtained by projecting the target three-dimensional model to which the diffuse reflection map is not applied to the two-dimensional pixel space is to avoid the influence of the information such as the inherent color and texture contained in the diffuse reflection map.
Step 502, respectively extracting image frequency information from the image to be processed and the two-dimensional image, referring to the image frequency information of the extracted image to be processed based on the image frequency information of the extracted two-dimensional image, and performing image frequency consistency constraint on the target three-dimensional model by using a third preset loss function to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed.
It should be understood that, as can be known from arrow information pointing to "a third preset loss function" in fig. 3, the third preset loss function needs to compare image frequency information included in a corresponding two-dimensional image obtained by projecting a target three-dimensional model to a two-dimensional pixel space with image frequency information included in preprocessing of a target object, and calculate an image frequency difference value between the two-dimensional image and an image to be processed to optimize the target three-dimensional model, and the third preset loss function uses a calculation formula of a mean square error function as shown in (10):
Figure BDA0003417848050000161
wherein Extract (. sup.) represents extraction
Figure BDA0003417848050000162
The obtained two-dimensional image frequency information, Extract (I)i) Is the image frequency information that the pre-processed image of the target object has,
Figure BDA0003417848050000171
a two-dimensional image acquired by projecting the target three-dimensional model into a two-dimensional space. Similarly, the third preset loss function may also adopt an average absolute error, an absolute value loss function, a logarithmic loss function, and the like as needed, and the first loss functions in synchronization step 204 are consistent and are not described herein again.
In this embodiment, a target three-dimensional model is projected to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model, and based on image frequency information of the extracted two-dimensional image, image frequency consistency constraint is performed on the target three-dimensional model by using a third preset loss function with reference to the image frequency information of the extracted image to be processed to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed, and optimizing the target three-dimensional model by using the difference value, so that the accuracy and precision of the target three-dimensional model are improved.
Further, in order to realize better three-dimensional reconstruction, an overall optimization loss function of the three-dimensional reconstruction method is designed according to the three preset loss functions:
specifically, a first preset loss function is set to have a first weight, a second preset loss function is set to have a second weight, and a third preset function is set to have a third weight, wherein the first weight, the second weight, and the third weight are all adjustable weights. Specifically, as shown in formula (11):
Figure BDA0003417848050000172
wherein, Loss represents the overall optimization objective,
Figure BDA0003417848050000173
weights of the first preset loss function, the second preset loss function and the third preset loss function are respectively expressed, and the three-dimensional reconstruction model can have different reconstruction effects through adjustment of the weights. For example, by adjusting a first predetermined loss function
Figure BDA0003417848050000174
After weighting, more accurate predicted vertex coordinates are formed, and the second preset loss function is adjusted
Figure BDA0003417848050000175
After weighting, the three-dimensional reconstruction model rendering image can be optimized, and the third preset loss function is adjusted
Figure BDA0003417848050000176
After weighting, the detail information of the three-dimensional reconstruction model can be more prominent, and a better three-dimensional reconstruction effect can be obtained by adjusting the weight values of the three loss functions.
Referring to fig. 6, fig. 6 is a structural diagram of a terminal provided in an embodiment of the present application, and for convenience of description, only a part related to the embodiment of the present application is shown.
The three-dimensional reconstruction apparatus 600 includes:
an obtaining module 601, configured to obtain an image to be processed of a target object;
the prediction module 602 is configured to perform three-dimensional model vertex prediction based on an image to be processed to obtain a prediction vertex coordinate;
the extraction module 603 is configured to extract image frequency information based on the image to be processed, obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimpose the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map;
and the three-dimensional reconstruction module 604 is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
The prediction module 602 is specifically configured to:
inputting an image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;
up-sampling the primary prediction vertex coordinates, and down-sampling the up-sampled primary prediction vertex coordinates;
the extracting module 603 is specifically configured to:
acquiring a diffuse reflection map of the image to be processed based on the image to be processed;
separating high-frequency and low-frequency information of the diffuse reflection mapping to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed;
respectively carrying out regularization processing on the high-frequency detail image information and the low-frequency contour image information to obtain regularized high-frequency detail image information and regularized low-frequency contour image information;
and superposing the high-frequency detail image information and the low-frequency outline image information to form an integral key frequency information characteristic diagram.
Correspondingly, the extracting module 603 is further configured to:
mapping the diffuse reflection map to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image;
based on the two-dimensional image, combining the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; and the second preset loss function is used for calculating the pixel difference value between the two-dimensional image and the image to be processed.
The three-dimensional reconstruction module 604 is specifically configured to:
extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate;
displacing the three-dimensional vertex coordinate along the normal direction according to the obtained displacement amount to obtain a corrected three-dimensional vertex coordinate;
and constructing a target three-dimensional model corresponding to the target object based on the corrected three-dimensional vertex coordinates.
And combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the down-sampled preliminary prediction vertex coordinates by using a first preset loss function to obtain prediction vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the down-sampled preliminary prediction vertex coordinates.
Correspondingly, the three-dimensional reconstruction module 604 is further configured to:
projecting the target three-dimensional model to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model;
respectively extracting image frequency information from the image to be processed and the two-dimensional image, referring to the extracted image frequency information of the image to be processed based on the extracted image frequency information of the two-dimensional image, and performing image frequency consistency constraint on the target three-dimensional model by using a third preset loss function to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed.
The three-dimensional reconstruction device provided by the embodiment of the application can realize each process of the three-dimensional reconstruction embodiment, can achieve the same technical effect, and is not repeated here for avoiding repetition.
Fig. 7 is a block diagram of a terminal according to an embodiment of the present application. As shown in the figure, the terminal 7 of this embodiment includes: at least one processor 70 (only one shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in any of the various method embodiments described above when executing the computer program 72.
The terminal 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal 7 may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is only an example of a terminal 7 and does not constitute a limitation of the terminal 7, and that it may comprise more or less components than those shown, or some components may be combined, or different components, for example the terminal may further comprise input output devices, network access devices, buses, etc.
The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may be an internal storage unit of the terminal 7, such as a hard disk or a memory of the terminal 7. The memory 71 may also be an external storage device of the terminal 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like provided on the terminal 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal. The memory 71 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the embodiments of the methods described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The present application realizes all or part of the processes in the method of the above embodiments, and may also be implemented by a computer program product, when the computer program product runs on a terminal, the steps in the above method embodiments may be implemented when the terminal executes the computer program product.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of three-dimensional reconstruction, comprising:
acquiring an image to be processed of a target object;
performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate;
extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
2. The three-dimensional reconstruction method according to claim 1, wherein performing three-dimensional model vertex prediction based on the image to be processed to obtain predicted vertex coordinates comprises:
inputting the image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;
up-sampling the preliminary prediction vertex coordinates, and down-sampling the up-sampled preliminary prediction vertex coordinates;
and combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the preliminarily predicted vertex coordinates after down-sampling by utilizing a first preset loss function to obtain the predicted vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the preliminarily predicted vertex coordinates after down-sampling.
3. The three-dimensional reconstruction method according to claim 1, wherein extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map comprises:
acquiring a diffuse reflection map of the image to be processed based on the image to be processed;
separating high-frequency and low-frequency information of the diffuse reflection map to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed;
regularization processing is respectively carried out on the high-frequency detail image information and the low-frequency contour image information, and the regularized high-frequency detail image information and low-frequency contour image information are obtained;
and superposing the high-frequency detailed image information and the low-frequency contour image information after regularization to form the image key frequency information characteristic diagram.
4. The three-dimensional reconstruction method according to claim 3, wherein, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and the three-dimensional reconstruction is performed based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object, the method further comprises:
mapping the diffuse reflection map to the target three-dimensional model to obtain the target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image;
based on the two-dimensional image, combining with the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain the optimized target three-dimensional model; the second preset loss function is used for calculating a pixel difference value between the two-dimensional image and the image to be processed.
5. The three-dimensional reconstruction method according to claim 1, wherein the shifting the prediction vertex coordinates based on the image key frequency information feature map to obtain corrected prediction vertex coordinates, and performing three-dimensional reconstruction based on the corrected prediction vertex coordinates to obtain a target three-dimensional model corresponding to the target object includes:
extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate;
displacing the predicted vertex coordinate along the normal direction according to the obtained displacement to obtain a corrected predicted vertex coordinate;
and constructing a target three-dimensional model corresponding to the target object based on the corrected predicted vertex coordinates.
6. The three-dimensional reconstruction method according to claim 1, wherein, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and a target three-dimensional model corresponding to the target object is obtained based on the corrected predicted vertex coordinates by performing three-dimensional reconstruction, the method further comprises:
projecting the target three-dimensional model to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model;
respectively extracting image frequency information from the image to be processed and the two-dimensional image, referring to the extracted image frequency information of the image to be processed based on the extracted image frequency information of the two-dimensional image, and performing image frequency consistency constraint on the target three-dimensional model by using a third preset loss function to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed.
7. A three-dimensional reconstruction apparatus, comprising:
the acquisition module is used for acquiring an image to be processed of a target object;
the prediction module is used for carrying out three-dimensional model vertex prediction based on the image to be processed to obtain a prediction vertex coordinate;
the extraction module is used for extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and the three-dimensional reconstruction module is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
8. The three-dimensional reconstruction apparatus of claim 7, wherein the prediction module is specifically configured to:
inputting the image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;
up-sampling the preliminary prediction vertex coordinates, and down-sampling the up-sampled preliminary prediction vertex coordinates;
and combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the preliminarily predicted vertex coordinates after down-sampling by utilizing a first preset loss function to obtain the predicted vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the preliminarily predicted vertex coordinates after down-sampling.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and being operable on the processor to perform a method for three-dimensional reconstruction, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202111553514.6A 2021-12-17 2021-12-17 Three-dimensional reconstruction method and device, terminal equipment and storage medium Pending CN114373056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111553514.6A CN114373056A (en) 2021-12-17 2021-12-17 Three-dimensional reconstruction method and device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111553514.6A CN114373056A (en) 2021-12-17 2021-12-17 Three-dimensional reconstruction method and device, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114373056A true CN114373056A (en) 2022-04-19

Family

ID=81139739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111553514.6A Pending CN114373056A (en) 2021-12-17 2021-12-17 Three-dimensional reconstruction method and device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114373056A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818224A (en) * 2022-05-27 2022-07-29 中国空气动力研究与发展中心计算空气动力研究所 Structural grid generation method, device, equipment and storage medium
CN114863020A (en) * 2022-04-29 2022-08-05 北京工业大学 Three-dimensional model construction method and device, electronic equipment and storage medium
CN115330985A (en) * 2022-07-25 2022-11-11 埃洛克航空科技(北京)有限公司 Data processing method and device for three-dimensional model optimization
CN115761137A (en) * 2022-11-24 2023-03-07 之江实验室 High-precision curved surface reconstruction method and device based on mutual fusion of normal vector and point cloud data
CN117456144A (en) * 2023-11-10 2024-01-26 中国人民解放军海军航空大学 Target building three-dimensional model optimization method based on visible light remote sensing image
CN117726760A (en) * 2024-02-07 2024-03-19 之江实验室 Training method and device for three-dimensional human body reconstruction model of video

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080181486A1 (en) * 2007-01-26 2008-07-31 Conversion Works, Inc. Methodology for 3d scene reconstruction from 2d image sequences
US20090177089A1 (en) * 2008-01-04 2009-07-09 Assaf Govari Three-dimensional image reconstruction using doppler ultrasound
CN101958008A (en) * 2010-10-12 2011-01-26 上海交通大学 Automatic texture mapping method in three-dimensional reconstruction of sequence image
CN107729806A (en) * 2017-09-05 2018-02-23 西安理工大学 Single-view Pose-varied face recognition method based on three-dimensional facial reconstruction
CN107832681A (en) * 2017-10-16 2018-03-23 福州大学 The high evaluation method of forest list ebon of joint LiDAR point cloud and synchronous remote sensing image
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN110021069A (en) * 2019-04-15 2019-07-16 武汉大学 A kind of method for reconstructing three-dimensional model based on grid deformation
CN111428579A (en) * 2020-03-03 2020-07-17 平安科技(深圳)有限公司 Face image acquisition method and system
CN112150612A (en) * 2020-09-23 2020-12-29 上海眼控科技股份有限公司 Three-dimensional model construction method and device, computer equipment and storage medium
WO2021044122A1 (en) * 2019-09-06 2021-03-11 Imperial College Of Science, Technology And Medicine Scene representation using image processing
CN113012269A (en) * 2019-12-19 2021-06-22 中国科学院深圳先进技术研究院 Three-dimensional image data rendering method and equipment based on GPU
CN113033305A (en) * 2021-02-21 2021-06-25 云南联合视觉科技有限公司 Living body detection method, living body detection device, terminal equipment and storage medium
US20210248812A1 (en) * 2021-03-05 2021-08-12 University Of Electronic Science And Technology Of China Method for reconstructing a 3d object based on dynamic graph network
CN113298931A (en) * 2021-05-14 2021-08-24 中国科学院深圳先进技术研究院 Reconstruction method and device of object model, terminal equipment and storage medium
CN113362450A (en) * 2021-06-02 2021-09-07 聚好看科技股份有限公司 Three-dimensional reconstruction method, device and system
CN113379815A (en) * 2021-06-25 2021-09-10 中德(珠海)人工智能研究院有限公司 Three-dimensional reconstruction method and device based on RGB camera and laser sensor and server
CN113409384A (en) * 2021-08-17 2021-09-17 深圳市华汉伟业科技有限公司 Pose estimation method and system of target object and robot
CN113570634A (en) * 2020-04-28 2021-10-29 北京达佳互联信息技术有限公司 Object three-dimensional reconstruction method and device, electronic equipment and storage medium
CN113610958A (en) * 2021-07-09 2021-11-05 云南联合视觉科技有限公司 3D image construction method and device based on style migration and terminal

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080181486A1 (en) * 2007-01-26 2008-07-31 Conversion Works, Inc. Methodology for 3d scene reconstruction from 2d image sequences
US20090177089A1 (en) * 2008-01-04 2009-07-09 Assaf Govari Three-dimensional image reconstruction using doppler ultrasound
CN101958008A (en) * 2010-10-12 2011-01-26 上海交通大学 Automatic texture mapping method in three-dimensional reconstruction of sequence image
CN107729806A (en) * 2017-09-05 2018-02-23 西安理工大学 Single-view Pose-varied face recognition method based on three-dimensional facial reconstruction
CN107832681A (en) * 2017-10-16 2018-03-23 福州大学 The high evaluation method of forest list ebon of joint LiDAR point cloud and synchronous remote sensing image
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN110021069A (en) * 2019-04-15 2019-07-16 武汉大学 A kind of method for reconstructing three-dimensional model based on grid deformation
WO2021044122A1 (en) * 2019-09-06 2021-03-11 Imperial College Of Science, Technology And Medicine Scene representation using image processing
CN113012269A (en) * 2019-12-19 2021-06-22 中国科学院深圳先进技术研究院 Three-dimensional image data rendering method and equipment based on GPU
CN111428579A (en) * 2020-03-03 2020-07-17 平安科技(深圳)有限公司 Face image acquisition method and system
WO2021174939A1 (en) * 2020-03-03 2021-09-10 平安科技(深圳)有限公司 Facial image acquisition method and system
CN113570634A (en) * 2020-04-28 2021-10-29 北京达佳互联信息技术有限公司 Object three-dimensional reconstruction method and device, electronic equipment and storage medium
CN112150612A (en) * 2020-09-23 2020-12-29 上海眼控科技股份有限公司 Three-dimensional model construction method and device, computer equipment and storage medium
CN113033305A (en) * 2021-02-21 2021-06-25 云南联合视觉科技有限公司 Living body detection method, living body detection device, terminal equipment and storage medium
US20210248812A1 (en) * 2021-03-05 2021-08-12 University Of Electronic Science And Technology Of China Method for reconstructing a 3d object based on dynamic graph network
CN113298931A (en) * 2021-05-14 2021-08-24 中国科学院深圳先进技术研究院 Reconstruction method and device of object model, terminal equipment and storage medium
CN113362450A (en) * 2021-06-02 2021-09-07 聚好看科技股份有限公司 Three-dimensional reconstruction method, device and system
CN113379815A (en) * 2021-06-25 2021-09-10 中德(珠海)人工智能研究院有限公司 Three-dimensional reconstruction method and device based on RGB camera and laser sensor and server
CN113610958A (en) * 2021-07-09 2021-11-05 云南联合视觉科技有限公司 3D image construction method and device based on style migration and terminal
CN113409384A (en) * 2021-08-17 2021-09-17 深圳市华汉伟业科技有限公司 Pose estimation method and system of target object and robot

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUJUN CAI ET AL.: "Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》, 27 February 2020 (2020-02-27) *
孙廓;胡平;张峰;张键;李华;陈增淦;李智;陈统一;陈中伟;林宗楷;: "人体正中神经内部显微结构的三维重建与可视化研究", 中国矫形外科杂志, no. 18, 20 September 2008 (2008-09-20) *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863020A (en) * 2022-04-29 2022-08-05 北京工业大学 Three-dimensional model construction method and device, electronic equipment and storage medium
CN114818224A (en) * 2022-05-27 2022-07-29 中国空气动力研究与发展中心计算空气动力研究所 Structural grid generation method, device, equipment and storage medium
CN115330985A (en) * 2022-07-25 2022-11-11 埃洛克航空科技(北京)有限公司 Data processing method and device for three-dimensional model optimization
CN115330985B (en) * 2022-07-25 2023-09-08 埃洛克航空科技(北京)有限公司 Data processing method and device for three-dimensional model optimization
CN115761137A (en) * 2022-11-24 2023-03-07 之江实验室 High-precision curved surface reconstruction method and device based on mutual fusion of normal vector and point cloud data
CN115761137B (en) * 2022-11-24 2023-12-22 之江实验室 High-precision curved surface reconstruction method and device based on mutual fusion of normal vector and point cloud data
CN117456144A (en) * 2023-11-10 2024-01-26 中国人民解放军海军航空大学 Target building three-dimensional model optimization method based on visible light remote sensing image
CN117456144B (en) * 2023-11-10 2024-05-07 中国人民解放军海军航空大学 Target building three-dimensional model optimization method based on visible light remote sensing image
CN117726760A (en) * 2024-02-07 2024-03-19 之江实验室 Training method and device for three-dimensional human body reconstruction model of video
CN117726760B (en) * 2024-02-07 2024-05-07 之江实验室 Training method and device for three-dimensional human body reconstruction model of video

Similar Documents

Publication Publication Date Title
CN114373056A (en) Three-dimensional reconstruction method and device, terminal equipment and storage medium
WO2021109876A1 (en) Image processing method, apparatus and device, and storage medium
Shen et al. Depth-aware image seam carving
WO2021249091A1 (en) Image processing method and apparatus, computer storage medium, and electronic device
CN104732479B (en) Resizing an image
US20090309877A1 (en) Soft shadow rendering
CN110335330A (en) Image simulation generation method and its system, deep learning algorithm training method and electronic equipment
US11205301B2 (en) Systems and methods for optimizing a model file
CN116385619B (en) Object model rendering method, device, computer equipment and storage medium
CN115713585B (en) Texture image reconstruction method, apparatus, computer device and storage medium
CN112927200B (en) Intrinsic image decomposition method and device, readable storage medium and electronic equipment
CN115375847A (en) Material recovery method, three-dimensional model generation method and model training method
US9734579B1 (en) Three-dimensional models visual differential
CN116363329B (en) Three-dimensional image generation method and system based on CGAN and LeNet-5
CN115830091B (en) Texture image generation method, device, equipment, storage medium and product
KR102559691B1 (en) Method and device for reconstructing neural rendering-based geometric color integrated 3D mesh
CN117152330B (en) Point cloud 3D model mapping method and device based on deep learning
CN109191556B (en) Method for extracting rasterized digital elevation model from LOD paging surface texture model
CN112215750A (en) Three-dimensional reconstruction method based on RGB-D data
CN117237570A (en) Virtual head avatar construction method and device, electronic equipment and storage medium
CN117557714A (en) Three-dimensional reconstruction method, electronic device and readable storage medium
CN111080764A (en) Method, device and equipment for synthesizing body texture
CN117011491A (en) Industrial equipment three-dimensional reconstruction method based on sparse voxels and spherical harmonics
US9129447B2 (en) Method and device for generating graphic images
CN115953549A (en) Three-dimensional human body reconstruction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination