CN114373056A - Three-dimensional reconstruction method and device, terminal equipment and storage medium - Google Patents
Three-dimensional reconstruction method and device, terminal equipment and storage medium Download PDFInfo
- Publication number
- CN114373056A CN114373056A CN202111553514.6A CN202111553514A CN114373056A CN 114373056 A CN114373056 A CN 114373056A CN 202111553514 A CN202111553514 A CN 202111553514A CN 114373056 A CN114373056 A CN 114373056A
- Authority
- CN
- China
- Prior art keywords
- image
- frequency
- processed
- dimensional
- vertex coordinates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 238000010586 diagram Methods 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 71
- 238000006073 displacement reaction Methods 0.000 claims description 38
- 238000005070 sampling Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 17
- 238000013507 mapping Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 11
- 230000000694 effects Effects 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 238000005286 illumination Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000011960 computer-aided design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241001584785 Anavitrinella pampinaria Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Processing Or Creating Images (AREA)
Abstract
The application is applicable to the field of three-dimensional reconstruction, and provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, a terminal device and a storage medium, wherein the method comprises the following steps: acquiring an image to be processed of a target object; performing three-dimensional model vertex prediction on the image to be processed to obtain a predicted vertex coordinate; acquiring a diffuse reflection map of the image to be processed, separating high-frequency and low-frequency information of the image contained in the diffuse reflection map, processing the high-frequency and low-frequency information respectively, and then superposing the high-frequency and low-frequency information to form an image key frequency information characteristic map; using the image key frequency information characteristic diagram to correct the predicted vertex coordinates, and constructing a three-dimensional reconstruction model of the target object based on the corrected predicted vertex coordinates; to further improve the realism, the three-dimensional reconstruction model is optimized using a loss function. According to the method and the device, the image details and the contour information are deeply mined, the three-dimensional reconstruction result is finely adjusted, the efficiency and the quality of the three-dimensional reconstruction model are balanced, and the precision and the robustness of the three-dimensional reconstruction can be improved.
Description
Technical Field
The present application belongs to the field of three-dimensional reconstruction technologies, and in particular, to a three-dimensional reconstruction method, an apparatus, a terminal device, and a storage medium.
Background
With the development of deep learning technology, the three-dimensional model of the image inference object has become a research hotspot and is widely concerned by researchers. The three-dimensional reconstruction model based on the image reconstruction target can well acquire low-frequency information of the surface contour of the object, and the three-dimensional reconstruction model is relatively credible by using a normal map and a light and shadow effect. However, this method ignores the existence of surface high-frequency detail information, and cannot form a high-precision three-dimensional reconstruction model. At present, methods for obtaining high-precision models can be mainly classified into three major categories: (1) is manually made by experienced art designer modelers. (2) Scan capture reconstruction is performed using specialized equipment. (3) And establishing a connection from the two-dimensional image to the three-dimensional reconstruction model based on a deep learning method. Generally, the manual method can build a very perfect model, but it is time-consuming and inefficient. The reconstruction by using professional equipment has the problems of high expense and difficult popularization. For the two methods, the method based on deep learning obviously has stronger semantic representation capability, but the existing method based on image reconstruction target three-dimensional reconstruction model needs to depend on a large number of training samples to limit the application range, and the algorithm can not effectively realize a high-precision model due to the omission of texture high-frequency detail information.
Generally speaking, the traditional method for capturing a three-dimensional model by manual manufacturing or scanning of professional equipment is time-consuming and labor-consuming, and the application range and the capability of the algorithm of the existing method for reconstructing the target three-dimensional model based on the image are limited, so that the efficient and accurate three-dimensional reconstruction is difficult to realize by the existing method for reconstructing the target three-dimensional model based on the image.
Disclosure of Invention
The embodiment of the application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, terminal equipment and a storage medium, and aims to solve the problem that in the prior art, the efficiency and the accuracy of an image reconstruction object three-dimensional model method are not high.
The embodiment of the application provides a three-dimensional reconstruction method, which comprises the following steps:
acquiring an image to be processed of a target object;
performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate;
extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
A second aspect of the embodiments of the present application provides a device of a three-dimensional reconstruction method, including:
the acquisition module is used for acquiring an image to be processed of a target object;
the prediction module is used for carrying out three-dimensional model vertex prediction based on the image to be processed to obtain a prediction vertex coordinate;
the extraction module is used for extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and the three-dimensional reconstruction module is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
A third aspect of embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.
A fifth aspect of the present application provides a computer program product, which, when run on a terminal, causes the terminal to perform the steps of the method of the first aspect described above.
As can be seen from the above, in this embodiment, an image to be processed of a target object is first obtained, and then, three-dimensional model vertex prediction is performed on the image to obtain a predicted vertex coordinate; simultaneously, extracting image frequency information of the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the prediction vertex is adjusted through the frequency information contained in the key information frequency characteristic diagram extracted from the image to be processed, and the adjusted prediction vertex coordinates are used for increasing the details and the contour information of the target three-dimensional model, so that the accuracy and the integrity of the generated target three-dimensional model are ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a first flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application;
fig. 2 is a flowchart ii of a three-dimensional reconstruction method provided in an embodiment of the present application;
fig. 3 is a network structure diagram of a three-dimensional reconstruction method according to an embodiment of the present application;
fig. 4 is a flowchart three of a three-dimensional reconstruction method provided in the embodiment of the present application;
fig. 5 is a fourth flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application;
fig. 6 is a structural diagram of a three-dimensional reconstruction apparatus according to an embodiment of the present application;
fig. 7 is a block diagram of a terminal according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In particular implementations, the terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).
In the discussion that follows, a terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
The terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.
Various applications that may be executed on the terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.
It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
Referring to fig. 1, fig. 1 is a first flowchart of a three-dimensional reconstruction method provided in an embodiment of the present application. As shown in fig. 1, a three-dimensional reconstruction method includes the following steps:
In some embodiments, the image to be processed is an image obtained by shooting a target object, or an image frame captured from an existing video containing the target object.
Specifically, when the target object is photographed to obtain the image to be processed, the image to be processed may be realized by using a monocular camera, and at this time, the image to be processed is specifically a monocular image, or the image to be processed is realized by using a multi-view camera, and at this time, the image to be processed is specifically a multi-view image.
The target object may be an animal, a person, a landscape, or the like, and the image to be processed corresponds to a face image, a whole body image, a landscape image, or the like.
And 102, performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate.
In this embodiment, a Vertex Prediction network (VPNet) trained in advance is used to input an image to be processed into the Vertex Prediction network, and a Vertex Prediction coordinate set included in x, y, and z axes of a target object in the image to be processed is obtained.
By way of example and not limitation, the vertex prediction Network employed may be a Convolutional Neural Network model for object detection, such as RCNN (Region-based Convolutional Neural Networks), or Fast RCNN (Fast Region-based Convolutional Neural Networks) that performs performance enhancement based on RCNN, Fast RCNN (Fast Region-based Convolutional Neural Networks, Faster Region-based Convolutional Neural Networks), or the like. The area convolution neural network model generally comprises two modules, namely a detection module and a frame labeling module, wherein the detection module is used for performing frame labeling on a target object in an image to be processed; and the second processing module is used for performing convolution operation on the framed target object image in the first module to predict a three-dimensional model vertex prediction coordinate set of the target object image on x, y and z axes. The calculation formula is shown as formula (1):
wherein the content of the first and second substances,representing a three-dimensional model prediction vertex coordinate set obtained by the prediction of a vertex prediction network VPNet, and processing an image I to be processediAnd inputting the vertex prediction network VPNet to perform vertex coordinate prediction to obtain the predicted vertex coordinates of the target object.
And 103, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram.
It should be understood that image frequency is a measure of how strongly the gray level of a pixel in an image varies, and the gray level is the brightness of each pixel in the image, and the range of gray levels is [0, 255], where white is 255 and black is 0. The image frequency reflects the difference degree between the gray value of the image pixel and the field point, wherein the gray value changes smoothly to be the image low-frequency information, and the change is violent to be the image low-frequency information. For example, an image containing a human face, the frequency of the whole image represents the contour of the human face and detail information, wherein low-frequency information is used for representing the whole contour of the human face, and high-frequency information is used for representing the detail wrinkles of the human face.
Further, to separate the high-frequency and medium-frequency detail image information from the low-frequency contour image information, assuming that R, G, B three-color values of image pixels have been extracted by the image rendering tool, where x is the gray scale value represented by the pixel, x is 0.2989 × R +0.5870 × G +0.1140 × B, and the gray scale value represented by a certain pixel can be obtained through this formula. Then, a certain threshold value is set, for example, the change range between the gray value and the domain point is set to be a low frequency between (0, 200), and the change range is set to be a high frequency between (200, 255), and the high frequency detail image information and the low frequency contour image information corresponding to the image to be processed are obtained through screening.
After the high-frequency detail image information and the low-frequency contour image information are obtained, the high-frequency detail image information and the low-frequency contour image information can be directly superposed to form an image key information characteristic diagram. The image high and low frequency information superposition calculation formula is shown as (2):
wherein the content of the first and second substances,low-frequency contour image information and high-frequency detail image information respectively representing an image,to representAnd forming an image key frequency information characteristic diagram by superposition.
In the specific implementation process, the high-frequency detail image information and the low-frequency contour image information can be processed respectively, the interference of noise in frequency features is eliminated, and the features of the high-frequency detail image information and the low-frequency contour image information are enhanced. And finally, overlapping the processed high-frequency detail image information and the low-frequency contour image information, and forming an integral key information characteristic diagram of the image after overlapping.
Specifically, as an optional implementation manner, the extracting of image frequency information based on an image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map includes:
acquiring a diffuse reflection map of the image to be processed based on the image to be processed; separating high-frequency and low-frequency information of the diffuse reflection mapping to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed; respectively carrying out regularization processing on the high-frequency detail image information and the low-frequency contour image information to obtain regularized high-frequency detail image information and regularized low-frequency contour image information; and overlapping the high-frequency detail image information and the low-frequency contour image information after regularization to form an image key frequency information characteristic diagram.
It should be understood that the diffuse reflection map contains the inherent color and texture information of the image, and since the texture information contained in the diffuse reflection map does not contain illumination and shadow occlusion, the influence of invalid information such as illumination and shadow occlusion existing in the image on the three-dimensional reconstruction effect is reduced to a certain extent, and the calculation formula of the extracted diffuse reflection map is shown in (3):
whereinRepresenting an image I to be processediThe DiffNet represents a Diffuse reflection map network (DiffNet) that generates the Diffuse reflection map, and the Diffuse reflection map network may adopt an end-to-end supervised learning Neural network model, such as CNN (Convolutional Neural Networks) and the like.
It should also be understood that if the high and low frequency information of the image is processed simultaneously, the respective characteristic information of the high and low frequencies is damaged, and thus the image is blurred, so that the high and low frequency information of the image needs to be separated and then respectively subjected to a regularization operation to enhance the respective frequency characteristics of the high and low frequency information of the image. The calculation formula of the image high and low frequency information separation is shown as (4):
wherein the content of the first and second substances,diffuse reflection maps respectively representing images to be processedAnd decomposing the low-frequency contour image information and the high-frequency detail image information. FDAE represents a Frequency Decoupling algorithm (Frequency Decoupling Auto-Encoder).
By way of example and not limitation, the frequency decoupling algorithm may be a fourier transform or wavelet transform, or the like.
In particular, the wavelet toolkit in MATLAB that is involved in image decomposition and reconstruction can be used to separate high and low frequency information in image textures. The wavelet tool box specifically performs image frequency decoupling as follows: selecting a diffuse reflection map of an image to be processed, selecting a sym4 function in a wavelet transformation toolbox to preprocess the diffuse reflection map, namely, carrying out the steps of loading the image, displaying the image, converting the format and the like, then carrying out high-frequency and low-frequency information decomposition on the preprocessed image by calling a wavedec function in the wavelet transformation toolbox, respectively extracting high-frequency and low-frequency coefficients by using a detcoef function and an appcoef function during decomposition, respectively extracting high-frequency information and low-frequency information of the image according to the high-frequency and low-frequency coefficients by using the wrcoef function, respectively carrying out regularization processing on the high-frequency and low-frequency information by using an L1 norm in MATLAB, and finally superposing the generated high-frequency detail image information and the low-frequency contour image information. The image high-low frequency information superposition calculation formula is consistent with the formula (2), and is not described herein again.
And 104, displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
It should be noted that, constructing the target three-dimensional model by directly predicting vertex coordinates does not show more details of the target object, and although we add vertices to the target three-dimensional model by direct tessellation, it does not add its details, but only adds more vertices belonging to the same plane as the original vertices. In order to increase the details of the target three-dimensional model, the position of the predicted vertex coordinates needs to be moved in some way, and therefore the predicted vertices can be adjusted by the frequency information included in the existing key information frequency characteristic map, and the pixel points of the image key frequency information characteristic map including the frequency information and the predicted vertex coordinates are in one-to-one correspondence relationship, so that the predicted vertex coordinates can be adjusted according to the image key frequency information characteristic map, and the direction and displacement amount for confirming the adjustment of the predicted vertex coordinates can be adjusted.
It should be understood that the image to be processed contains reflection information of light, and for a pixel point in the image, the illumination may generate a line perpendicular to a tangential plane of the pixel point, and the line has a direction because the reflection of light is irregular. And storing the direction value n of the line into the original positions of red, green and blue by using compressed x, y and z-axis coordinate values, wherein the direction value n formed by the compressed x, y and z-axis coordinate values is consistent with the number of vertexes of the target three-dimensional model and has a one-to-one correspondence relation, so that the adjustment direction can be confirmed for the predicted vertex coordinates by using the direction value n. Assuming that the x-axis coordinate value of a certain predicted vertex coordinate is a value in the range of [ -1, 1], the value of the direction value n compressed on the x-axis is obtained by a formula (0.5 x +0.5) × 255, and then the y-axis coordinate value and the z-axis coordinate value of the predicted vertex coordinate can also be compressed by the formula, the vector formed by the three values forms the direction value n, and the value of the predicted vertex after the adjustment direction can be obtained by multiplying the direction value n by the coordinate value of the corresponding predicted vertex x, y, z.
It should also be understood that the image frequency stored in the image key frequency information feature map is a measure of the intensity of the change of the gray level of the pixel in the image, and the intensity of the change can be used to describe the height of the protrusion and the depression of the object surface, which is defined as a height value h, and the height value can be a value stored by the pixel in the image using a color channel, where the height value h is in the range of [0, 255], white is represented by 0, and black is represented by 255, and here, a common gray level extraction formula based on human eye perception can be adopted by taking a weighted average of the RGB values of the pixel points: the height h ═ R0.2126 + G0.7152 + B0.0722, which is derived from the human eye's sensitivity to different colors.
Further, the predicted vertex coordinates are displaced by the obtained displacement direction of the predicted vertex coordinates and the displacement value in the displacement direction, and the calculation formula is shown as (5):
p′=p+(h-1)n#(5)
wherein p is the current predicted vertex coordinate, p' is the corrected predicted vertex coordinate, n is the displacement direction value of the predicted vertex coordinate needing to be adjusted, and h is the height value of the adjusted predicted vertex coordinate displacement. Where the height value h e 0, 1, minus 1 for the height value h to have its interval 0, 1 become-1, 0, because the normal vector of the surface is usually facing the outside of the grid, which means that the outward deflection can be replaced by an inward deflection, because it is generally more convenient to spring the geometry in than to pull it out. In general, the corrected predicted vertex coordinates are obtained by calculating the sum of the current predicted vertex coordinates p and the product of the displacement direction value n and the displacement height value h.
Furthermore, the corrected predicted vertex coordinates are connected with adjacent predicted vertex coordinates to form a grid, and the grid connected with all the predicted vertex coordinates forms a target three-dimensional model corresponding to the target object.
Specifically, as an optional implementation manner, the method for obtaining a target three-dimensional model corresponding to a target object includes the steps of displacing a predicted vertex coordinate based on an image key frequency information feature map to obtain a corrected predicted vertex coordinate, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinate, where the steps include:
extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate; displacing the predicted vertex coordinate along the normal direction according to the obtained displacement to obtain a corrected predicted vertex coordinate; and constructing a target three-dimensional model corresponding to the target object based on the corrected predicted vertex coordinates.
It should be understood that the displacement map, also called a height map, can be used to describe the protrusions and depressions of the object surface, and the pixel points of the displacement map adopt a color channel to generate a height value h for each pixel; the pixel point of the normal map adopts three color channels to generate illumination for each pixel, and the illumination is corresponding to a direction value n vertical to a tangent plane of the pixel point, so that displacement optimization can be performed on the predicted vertex coordinate according to a certain displacement direction and displacement value by obtaining the displacement amount contained in the displacement map and the direction value in the normal map.
As an example and not by way of limitation, on one hand, a displacement map may be created by inputting an image to be processed of a target object through drawing software such as PhotoShop, and on the other hand, a displacement map of the image to be processed may be generated by inputting the image to be processed of the target object through a Neural Network model, for example, a Convolutional Neural Network (CNN), a Fast Region-based Convolutional Neural Network (Fast Convolutional Neural Network) with performance enhanced based on RCNN, or the like. The generated displacement map includes the displacement amount of the predicted vertex coordinate displaced along the normal direction thereof, and the predicted vertex coordinate is corrected by the displacement amount, and the calculation formula is shown as (6):
wherein Displacement (—) represents an operation that displaces a predicted vertex,representing the generated three-dimensional model of the object,to representThe image key frequency information characteristic diagram formed by superposition,representing the predicted vertex coordinates.
As an example and not by way of limitation, the normal map may be obtained by directly inputting a to-be-processed image by using a "3D" sub-menu of a "filter" tool in Photoshop, or by training a neural network such as a Generative Adaptive Network (GAN) to obtain the normal map of the image.
Specifically, the corrected predicted vertex coordinates are obtained by obtaining a height value n in the displacement map and a displacement direction value h of the normal map, and displacing the predicted vertex coordinates by a certain height value n according to the direction value h, wherein the calculation formula is consistent with the formula (5), and is not described herein again.
Further, as an example and not by way of limitation, an Open Graphics library (OpenGL) or Computer Aided Design (CAD) software may be adopted, the obtained predicted vertex coordinates are input, and then some other auxiliary information is set, so as to directly obtain a target three-dimensional model corresponding to the target object, which is not described herein again.
In the embodiment, firstly, an image to be processed of a target object is obtained, and then, the vertex of a three-dimensional model is predicted based on the image to be processed, so that the coordinate of a predicted vertex is obtained; simultaneously, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image key frequency information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the predicted vertex coordinates are adjusted through the frequency information contained in the key frequency information characteristic diagram extracted from the image to be processed, so that the detail and contour information of the generated target three-dimensional model are increased, and the accuracy and the integrity of the generated target three-dimensional model are ensured.
Different embodiments of the three-dimensional reconstruction method are also provided in the embodiments of the present application.
Referring to fig. 2, fig. 2 is a flowchart ii of a three-dimensional reconstruction method provided. As shown in fig. 2, a three-dimensional reconstruction method includes the following steps:
The implementation process of this step is the same as that of step 101 in the foregoing embodiment, and is not described here again.
The implementation process of this step is the same as that of step 102 in the foregoing embodiment, and is not described here again.
And step 203, performing up-sampling on the primary prediction vertex coordinates, and performing down-sampling on the up-sampled primary prediction vertex coordinates.
It is to be understood that in the field of three-dimensional reconstruction, the upsampling in the processing of the three-dimensional model is based on the following idea: two points in the three-dimensional model vertex coordinates can determine a line, three points can determine a plane, three vertices of the three-dimensional model can form a triangular mesh, the basic processing unit of the three-dimensional model is the mesh, the mesh formed by the three vertices is assumed to exist at the moment, then vertex interpolation is carried out on a mesh area formed by the three vertices through up-sampling operation, more vertex numbers are increased, and at the moment, the triangular mesh is divided into a plurality of meshes formed by splitting the interpolation vertex numbers. In effect, the increased meshes can increase the details of the surface of the three-dimensional model, so that the model is smoother and more delicate. The upsampling operation can adopt modes such as bilinear interpolation, transposed convolution and the like, and the computation formula of the upsampling operation is shown as (7):
whereinRepresenting the predicted vertex coordinates of the target object to be Representing the set of predicted vertex coordinates that have undergone the upsampling operation.
Furthermore, the vertex coordinates after the upsampling operation are downsampled to keep the same dimension as the target object. The number of the vertexes is ensured to be consistent with the number of the vertexes of the prediction vertexes acquired through the vertex prediction network through downsampling, and downsampling operation can be performed on a vertex coordinate set in a random downsampling mode and the like so as to optimize the vertex prediction network through a subsequent first preset loss function.
And 204, combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the down-sampled preliminary prediction vertex coordinates by using a first preset loss function to obtain prediction vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the down-sampled preliminary prediction vertex coordinates.
As can be seen by combining the arrow information pointing to the "first preset loss function" in fig. 3, the calculation of the first preset loss function requires obtaining the preliminary predicted vertex coordinates after upsampling and the real vertex coordinates of the target object used for training in the vertex prediction network VPNet, the difference between the two is calculated by the first preset loss function, and the vertex prediction network parameters are continuously adjusted by the difference, so that the predicted vertex coordinate coefficients are as close as possible to the real vertex coordinate coefficients until the difference between the two is minimized.
By way of example and not limitation, the first predetermined loss function may be a mean-square error (MSE) function, and in addition, the first predetermined loss function may employ an average absolute error, an absolute value loss function, a logarithmic loss function, or the like, as desired. The first loss function is calculated using the mean square error as shown in (8):
where MSE is the mean square error, Downsampling denotes the down-sampling operation performed on the upsampled preliminary predicted vertex coordinates, ViIn order to be the coordinates of the real vertices,for the true vertex coordinate ViAnd performing preliminary prediction of vertex coordinates after the up-sampling operation.
It should be understood that the mean square error MSE is the sum of squares of the difference between the predicted value of the parameter and the true value of the parameter, and then the mean square error MSE is averaged, and when the obtained value is smaller, it indicates that the current vertex prediction network model is more accurate, so that the parameters in the vertex prediction network model can be adjusted through predicting the vertex coordinates for many times and changing the degree of the predicted value, so as to obtain a more accurate vertex prediction effect.
the implementation process of this step is the same as the implementation process of step 103 in the foregoing embodiment, and is not described here again.
And step 206, displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
The implementation process of this step is the same as that of step 104 in the foregoing embodiment, and is not described here again.
In the embodiment of the application, an image to be processed of a target object is obtained, the image to be processed is input to a vertex prediction network to obtain a preliminary prediction vertex coordinate, the preliminary prediction vertex coordinate is subjected to up-sampling and down-sampling, and then vertex prediction constraint is performed on the preliminary prediction vertex coordinate after down-sampling by using a preset loss function to obtain a prediction vertex coordinate; simultaneously, extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image key frequency information characteristic diagram; and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object. In the process, the predicted vertex coordinates are adjusted through the frequency information contained in the key frequency information characteristic diagram extracted from the image to be processed, the detail and contour information of the generated target three-dimensional model are increased, further, the predicted vertex coordinates are constrained by adopting a loss function, the accuracy of the predicted vertex coordinates is ensured, and the accuracy and the robustness of the generated target three-dimensional model are ensured on the basis of the constraint.
When the image overall key information feature map is obtained based on the diffuse reflection mapping of the image to be processed, with reference to fig. 4, the method further includes, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and three-dimensional reconstruction is performed based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object:
It should be understood that the pixel points of the diffuse reflection map include the inherent color and texture of the image, and the pixel points on the two-dimensional plane map can be represented by u and v coordinates, wherein the horizontal direction is u and the vertical direction is v; and vertex coordinates x, y and z of the target three-dimensional model, wherein pixel points of the diffuse reflection map and vertexes of the target three-dimensional model are in one-to-one correspondence, and based on the vertex coordinates, positions including u and v coordinates can be pasted to the vertex coordinates of the corresponding target three-dimensional model, so that the target three-dimensional model can obtain the inherent color and texture in the diffuse reflection map.
Specifically, as can be seen by combining the arrow information pointing to the "second preset loss function" in fig. 3, the second preset loss function calculates a corresponding two-dimensional image obtained by projecting the target three-dimensional model to the two-dimensional pixel space and the target object to-be-processed image IiComparing pixel values of pixels at the same position, optimizing the target three-dimensional model by calculating the difference of the pixel values between the pixels, wherein the second preset loss function uses a calculation formula of mean square error as shown in (9):
wherein Render (, Render) represents a rendering operation, which is to Render a three-dimensional object into a corresponding two-dimensional image for processing,in order to be a three-dimensional model of the object,for diffuse reflection mapping of the image to be processed, IiIs an image to be processed. Similarly, the second preset loss function may also adopt an average absolute error, an absolute value loss function, a logarithmic loss function, and the like as needed, and the first loss functions in synchronization step 204 are consistent and are not described herein again.
In this embodiment, a diffuse reflection map is mapped to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, the target three-dimensional model with inherent color and texture is projected to a two-dimensional pixel space to obtain a rendered two-dimensional image, and then, based on the two-dimensional image, in combination with a pixel value of the image to be predicted, a second preset loss function is used to perform pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; and the second preset loss function is used for calculating the pixel difference value between the two-dimensional image and the image to be processed. By calculating the pixel difference value between the images, the target three-dimensional model is optimized by using the difference value, and the accuracy and precision of the target three-dimensional model are improved.
Further, with reference to fig. 5, after the predicting vertex coordinates are shifted based on the image key frequency information feature map to obtain the corrected predicting vertex coordinates, and the three-dimensional reconstruction is performed based on the corrected predicting vertex coordinates to obtain the target three-dimensional model corresponding to the target object, the method may further include:
It should be understood that the image frequency information contained in the corresponding two-dimensional image obtained by projecting the target three-dimensional model to which the diffuse reflection map is not applied to the two-dimensional pixel space is to avoid the influence of the information such as the inherent color and texture contained in the diffuse reflection map.
It should be understood that, as can be known from arrow information pointing to "a third preset loss function" in fig. 3, the third preset loss function needs to compare image frequency information included in a corresponding two-dimensional image obtained by projecting a target three-dimensional model to a two-dimensional pixel space with image frequency information included in preprocessing of a target object, and calculate an image frequency difference value between the two-dimensional image and an image to be processed to optimize the target three-dimensional model, and the third preset loss function uses a calculation formula of a mean square error function as shown in (10):
wherein Extract (. sup.) represents extractionThe obtained two-dimensional image frequency information, Extract (I)i) Is the image frequency information that the pre-processed image of the target object has,a two-dimensional image acquired by projecting the target three-dimensional model into a two-dimensional space. Similarly, the third preset loss function may also adopt an average absolute error, an absolute value loss function, a logarithmic loss function, and the like as needed, and the first loss functions in synchronization step 204 are consistent and are not described herein again.
In this embodiment, a target three-dimensional model is projected to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model, and based on image frequency information of the extracted two-dimensional image, image frequency consistency constraint is performed on the target three-dimensional model by using a third preset loss function with reference to the image frequency information of the extracted image to be processed to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed, and optimizing the target three-dimensional model by using the difference value, so that the accuracy and precision of the target three-dimensional model are improved.
Further, in order to realize better three-dimensional reconstruction, an overall optimization loss function of the three-dimensional reconstruction method is designed according to the three preset loss functions:
specifically, a first preset loss function is set to have a first weight, a second preset loss function is set to have a second weight, and a third preset function is set to have a third weight, wherein the first weight, the second weight, and the third weight are all adjustable weights. Specifically, as shown in formula (11):
wherein, Loss represents the overall optimization objective,weights of the first preset loss function, the second preset loss function and the third preset loss function are respectively expressed, and the three-dimensional reconstruction model can have different reconstruction effects through adjustment of the weights. For example, by adjusting a first predetermined loss functionAfter weighting, more accurate predicted vertex coordinates are formed, and the second preset loss function is adjustedAfter weighting, the three-dimensional reconstruction model rendering image can be optimized, and the third preset loss function is adjustedAfter weighting, the detail information of the three-dimensional reconstruction model can be more prominent, and a better three-dimensional reconstruction effect can be obtained by adjusting the weight values of the three loss functions.
Referring to fig. 6, fig. 6 is a structural diagram of a terminal provided in an embodiment of the present application, and for convenience of description, only a part related to the embodiment of the present application is shown.
The three-dimensional reconstruction apparatus 600 includes:
an obtaining module 601, configured to obtain an image to be processed of a target object;
the prediction module 602 is configured to perform three-dimensional model vertex prediction based on an image to be processed to obtain a prediction vertex coordinate;
the extraction module 603 is configured to extract image frequency information based on the image to be processed, obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimpose the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map;
and the three-dimensional reconstruction module 604 is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
The prediction module 602 is specifically configured to:
inputting an image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;
up-sampling the primary prediction vertex coordinates, and down-sampling the up-sampled primary prediction vertex coordinates;
the extracting module 603 is specifically configured to:
acquiring a diffuse reflection map of the image to be processed based on the image to be processed;
separating high-frequency and low-frequency information of the diffuse reflection mapping to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed;
respectively carrying out regularization processing on the high-frequency detail image information and the low-frequency contour image information to obtain regularized high-frequency detail image information and regularized low-frequency contour image information;
and superposing the high-frequency detail image information and the low-frequency outline image information to form an integral key frequency information characteristic diagram.
Correspondingly, the extracting module 603 is further configured to:
mapping the diffuse reflection map to a target three-dimensional model to obtain a target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image;
based on the two-dimensional image, combining the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain an optimized target three-dimensional model; and the second preset loss function is used for calculating the pixel difference value between the two-dimensional image and the image to be processed.
The three-dimensional reconstruction module 604 is specifically configured to:
extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate;
displacing the three-dimensional vertex coordinate along the normal direction according to the obtained displacement amount to obtain a corrected three-dimensional vertex coordinate;
and constructing a target three-dimensional model corresponding to the target object based on the corrected three-dimensional vertex coordinates.
And combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the down-sampled preliminary prediction vertex coordinates by using a first preset loss function to obtain prediction vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the down-sampled preliminary prediction vertex coordinates.
Correspondingly, the three-dimensional reconstruction module 604 is further configured to:
projecting the target three-dimensional model to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model;
respectively extracting image frequency information from the image to be processed and the two-dimensional image, referring to the extracted image frequency information of the image to be processed based on the extracted image frequency information of the two-dimensional image, and performing image frequency consistency constraint on the target three-dimensional model by using a third preset loss function to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed.
The three-dimensional reconstruction device provided by the embodiment of the application can realize each process of the three-dimensional reconstruction embodiment, can achieve the same technical effect, and is not repeated here for avoiding repetition.
Fig. 7 is a block diagram of a terminal according to an embodiment of the present application. As shown in the figure, the terminal 7 of this embodiment includes: at least one processor 70 (only one shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in any of the various method embodiments described above when executing the computer program 72.
The terminal 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal 7 may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is only an example of a terminal 7 and does not constitute a limitation of the terminal 7, and that it may comprise more or less components than those shown, or some components may be combined, or different components, for example the terminal may further comprise input output devices, network access devices, buses, etc.
The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may be an internal storage unit of the terminal 7, such as a hard disk or a memory of the terminal 7. The memory 71 may also be an external storage device of the terminal 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like provided on the terminal 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal. The memory 71 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the embodiments of the methods described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The present application realizes all or part of the processes in the method of the above embodiments, and may also be implemented by a computer program product, when the computer program product runs on a terminal, the steps in the above method embodiments may be implemented when the terminal executes the computer program product.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.
Claims (10)
1. A method of three-dimensional reconstruction, comprising:
acquiring an image to be processed of a target object;
performing three-dimensional model vertex prediction based on the image to be processed to obtain a predicted vertex coordinate;
extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
2. The three-dimensional reconstruction method according to claim 1, wherein performing three-dimensional model vertex prediction based on the image to be processed to obtain predicted vertex coordinates comprises:
inputting the image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;
up-sampling the preliminary prediction vertex coordinates, and down-sampling the up-sampled preliminary prediction vertex coordinates;
and combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the preliminarily predicted vertex coordinates after down-sampling by utilizing a first preset loss function to obtain the predicted vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the preliminarily predicted vertex coordinates after down-sampling.
3. The three-dimensional reconstruction method according to claim 1, wherein extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed, and superimposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information feature map comprises:
acquiring a diffuse reflection map of the image to be processed based on the image to be processed;
separating high-frequency and low-frequency information of the diffuse reflection map to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed;
regularization processing is respectively carried out on the high-frequency detail image information and the low-frequency contour image information, and the regularized high-frequency detail image information and low-frequency contour image information are obtained;
and superposing the high-frequency detailed image information and the low-frequency contour image information after regularization to form the image key frequency information characteristic diagram.
4. The three-dimensional reconstruction method according to claim 3, wherein, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and the three-dimensional reconstruction is performed based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object, the method further comprises:
mapping the diffuse reflection map to the target three-dimensional model to obtain the target three-dimensional model with inherent color and texture, and projecting the target three-dimensional model with inherent color and texture to a two-dimensional pixel space to obtain a rendered two-dimensional image;
based on the two-dimensional image, combining with the pixel value of the image to be predicted, and utilizing a second preset loss function to carry out pixel consistency constraint on the target three-dimensional model to obtain the optimized target three-dimensional model; the second preset loss function is used for calculating a pixel difference value between the two-dimensional image and the image to be processed.
5. The three-dimensional reconstruction method according to claim 1, wherein the shifting the prediction vertex coordinates based on the image key frequency information feature map to obtain corrected prediction vertex coordinates, and performing three-dimensional reconstruction based on the corrected prediction vertex coordinates to obtain a target three-dimensional model corresponding to the target object includes:
extracting a displacement map from the image key frequency information characteristic map, wherein the displacement map comprises displacement corresponding to the displacement of the predicted vertex coordinate along the normal direction of the predicted vertex coordinate;
displacing the predicted vertex coordinate along the normal direction according to the obtained displacement to obtain a corrected predicted vertex coordinate;
and constructing a target three-dimensional model corresponding to the target object based on the corrected predicted vertex coordinates.
6. The three-dimensional reconstruction method according to claim 1, wherein, after the predicted vertex coordinates are displaced based on the image key frequency information feature map to obtain corrected predicted vertex coordinates, and a target three-dimensional model corresponding to the target object is obtained based on the corrected predicted vertex coordinates by performing three-dimensional reconstruction, the method further comprises:
projecting the target three-dimensional model to a two-dimensional pixel space to obtain a two-dimensional image rendered by the target three-dimensional model;
respectively extracting image frequency information from the image to be processed and the two-dimensional image, referring to the extracted image frequency information of the image to be processed based on the extracted image frequency information of the two-dimensional image, and performing image frequency consistency constraint on the target three-dimensional model by using a third preset loss function to obtain an optimized target three-dimensional model; and the third preset loss function is used for calculating an image frequency difference value between the two-dimensional image and the image to be processed.
7. A three-dimensional reconstruction apparatus, comprising:
the acquisition module is used for acquiring an image to be processed of a target object;
the prediction module is used for carrying out three-dimensional model vertex prediction based on the image to be processed to obtain a prediction vertex coordinate;
the extraction module is used for extracting image frequency information based on the image to be processed to obtain high-frequency detail image information and low-frequency contour image information corresponding to the image to be processed and superposing the high-frequency detail image information and the low-frequency contour image information to obtain an image overall key information characteristic diagram;
and the three-dimensional reconstruction module is used for displacing the predicted vertex coordinates based on the image key frequency information characteristic diagram to obtain corrected predicted vertex coordinates, and performing three-dimensional reconstruction based on the corrected predicted vertex coordinates to obtain a target three-dimensional model corresponding to the target object.
8. The three-dimensional reconstruction apparatus of claim 7, wherein the prediction module is specifically configured to:
inputting the image to be processed into a vertex prediction network to obtain a preliminary prediction vertex coordinate of the image to be processed;
up-sampling the preliminary prediction vertex coordinates, and down-sampling the up-sampled preliminary prediction vertex coordinates;
and combining the real vertex coordinates of the target object, and performing vertex prediction constraint on the preliminarily predicted vertex coordinates after down-sampling by utilizing a first preset loss function to obtain the predicted vertex coordinates, wherein the first preset loss function is used for calculating a difference value between the real vertex coordinates and the preliminarily predicted vertex coordinates after down-sampling.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and being operable on the processor to perform a method for three-dimensional reconstruction, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111553514.6A CN114373056A (en) | 2021-12-17 | 2021-12-17 | Three-dimensional reconstruction method and device, terminal equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111553514.6A CN114373056A (en) | 2021-12-17 | 2021-12-17 | Three-dimensional reconstruction method and device, terminal equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114373056A true CN114373056A (en) | 2022-04-19 |
Family
ID=81139739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111553514.6A Pending CN114373056A (en) | 2021-12-17 | 2021-12-17 | Three-dimensional reconstruction method and device, terminal equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114373056A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114818224A (en) * | 2022-05-27 | 2022-07-29 | 中国空气动力研究与发展中心计算空气动力研究所 | Structural grid generation method, device, equipment and storage medium |
CN114863020A (en) * | 2022-04-29 | 2022-08-05 | 北京工业大学 | Three-dimensional model construction method and device, electronic equipment and storage medium |
CN115330985A (en) * | 2022-07-25 | 2022-11-11 | 埃洛克航空科技(北京)有限公司 | Data processing method and device for three-dimensional model optimization |
CN115761137A (en) * | 2022-11-24 | 2023-03-07 | 之江实验室 | High-precision curved surface reconstruction method and device based on mutual fusion of normal vector and point cloud data |
CN117456144A (en) * | 2023-11-10 | 2024-01-26 | 中国人民解放军海军航空大学 | Target building three-dimensional model optimization method based on visible light remote sensing image |
CN117726760A (en) * | 2024-02-07 | 2024-03-19 | 之江实验室 | Training method and device for three-dimensional human body reconstruction model of video |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080181486A1 (en) * | 2007-01-26 | 2008-07-31 | Conversion Works, Inc. | Methodology for 3d scene reconstruction from 2d image sequences |
US20090177089A1 (en) * | 2008-01-04 | 2009-07-09 | Assaf Govari | Three-dimensional image reconstruction using doppler ultrasound |
CN101958008A (en) * | 2010-10-12 | 2011-01-26 | 上海交通大学 | Automatic texture mapping method in three-dimensional reconstruction of sequence image |
CN107729806A (en) * | 2017-09-05 | 2018-02-23 | 西安理工大学 | Single-view Pose-varied face recognition method based on three-dimensional facial reconstruction |
CN107832681A (en) * | 2017-10-16 | 2018-03-23 | 福州大学 | The high evaluation method of forest list ebon of joint LiDAR point cloud and synchronous remote sensing image |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
CN110021069A (en) * | 2019-04-15 | 2019-07-16 | 武汉大学 | A kind of method for reconstructing three-dimensional model based on grid deformation |
CN111428579A (en) * | 2020-03-03 | 2020-07-17 | 平安科技(深圳)有限公司 | Face image acquisition method and system |
CN112150612A (en) * | 2020-09-23 | 2020-12-29 | 上海眼控科技股份有限公司 | Three-dimensional model construction method and device, computer equipment and storage medium |
WO2021044122A1 (en) * | 2019-09-06 | 2021-03-11 | Imperial College Of Science, Technology And Medicine | Scene representation using image processing |
CN113012269A (en) * | 2019-12-19 | 2021-06-22 | 中国科学院深圳先进技术研究院 | Three-dimensional image data rendering method and equipment based on GPU |
CN113033305A (en) * | 2021-02-21 | 2021-06-25 | 云南联合视觉科技有限公司 | Living body detection method, living body detection device, terminal equipment and storage medium |
US20210248812A1 (en) * | 2021-03-05 | 2021-08-12 | University Of Electronic Science And Technology Of China | Method for reconstructing a 3d object based on dynamic graph network |
CN113298931A (en) * | 2021-05-14 | 2021-08-24 | 中国科学院深圳先进技术研究院 | Reconstruction method and device of object model, terminal equipment and storage medium |
CN113362450A (en) * | 2021-06-02 | 2021-09-07 | 聚好看科技股份有限公司 | Three-dimensional reconstruction method, device and system |
CN113379815A (en) * | 2021-06-25 | 2021-09-10 | 中德(珠海)人工智能研究院有限公司 | Three-dimensional reconstruction method and device based on RGB camera and laser sensor and server |
CN113409384A (en) * | 2021-08-17 | 2021-09-17 | 深圳市华汉伟业科技有限公司 | Pose estimation method and system of target object and robot |
CN113570634A (en) * | 2020-04-28 | 2021-10-29 | 北京达佳互联信息技术有限公司 | Object three-dimensional reconstruction method and device, electronic equipment and storage medium |
CN113610958A (en) * | 2021-07-09 | 2021-11-05 | 云南联合视觉科技有限公司 | 3D image construction method and device based on style migration and terminal |
-
2021
- 2021-12-17 CN CN202111553514.6A patent/CN114373056A/en active Pending
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080181486A1 (en) * | 2007-01-26 | 2008-07-31 | Conversion Works, Inc. | Methodology for 3d scene reconstruction from 2d image sequences |
US20090177089A1 (en) * | 2008-01-04 | 2009-07-09 | Assaf Govari | Three-dimensional image reconstruction using doppler ultrasound |
CN101958008A (en) * | 2010-10-12 | 2011-01-26 | 上海交通大学 | Automatic texture mapping method in three-dimensional reconstruction of sequence image |
CN107729806A (en) * | 2017-09-05 | 2018-02-23 | 西安理工大学 | Single-view Pose-varied face recognition method based on three-dimensional facial reconstruction |
CN107832681A (en) * | 2017-10-16 | 2018-03-23 | 福州大学 | The high evaluation method of forest list ebon of joint LiDAR point cloud and synchronous remote sensing image |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
CN110021069A (en) * | 2019-04-15 | 2019-07-16 | 武汉大学 | A kind of method for reconstructing three-dimensional model based on grid deformation |
WO2021044122A1 (en) * | 2019-09-06 | 2021-03-11 | Imperial College Of Science, Technology And Medicine | Scene representation using image processing |
CN113012269A (en) * | 2019-12-19 | 2021-06-22 | 中国科学院深圳先进技术研究院 | Three-dimensional image data rendering method and equipment based on GPU |
CN111428579A (en) * | 2020-03-03 | 2020-07-17 | 平安科技(深圳)有限公司 | Face image acquisition method and system |
WO2021174939A1 (en) * | 2020-03-03 | 2021-09-10 | 平安科技(深圳)有限公司 | Facial image acquisition method and system |
CN113570634A (en) * | 2020-04-28 | 2021-10-29 | 北京达佳互联信息技术有限公司 | Object three-dimensional reconstruction method and device, electronic equipment and storage medium |
CN112150612A (en) * | 2020-09-23 | 2020-12-29 | 上海眼控科技股份有限公司 | Three-dimensional model construction method and device, computer equipment and storage medium |
CN113033305A (en) * | 2021-02-21 | 2021-06-25 | 云南联合视觉科技有限公司 | Living body detection method, living body detection device, terminal equipment and storage medium |
US20210248812A1 (en) * | 2021-03-05 | 2021-08-12 | University Of Electronic Science And Technology Of China | Method for reconstructing a 3d object based on dynamic graph network |
CN113298931A (en) * | 2021-05-14 | 2021-08-24 | 中国科学院深圳先进技术研究院 | Reconstruction method and device of object model, terminal equipment and storage medium |
CN113362450A (en) * | 2021-06-02 | 2021-09-07 | 聚好看科技股份有限公司 | Three-dimensional reconstruction method, device and system |
CN113379815A (en) * | 2021-06-25 | 2021-09-10 | 中德(珠海)人工智能研究院有限公司 | Three-dimensional reconstruction method and device based on RGB camera and laser sensor and server |
CN113610958A (en) * | 2021-07-09 | 2021-11-05 | 云南联合视觉科技有限公司 | 3D image construction method and device based on style migration and terminal |
CN113409384A (en) * | 2021-08-17 | 2021-09-17 | 深圳市华汉伟业科技有限公司 | Pose estimation method and system of target object and robot |
Non-Patent Citations (2)
Title |
---|
YUJUN CAI ET AL.: "Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》, 27 February 2020 (2020-02-27) * |
孙廓;胡平;张峰;张键;李华;陈增淦;李智;陈统一;陈中伟;林宗楷;: "人体正中神经内部显微结构的三维重建与可视化研究", 中国矫形外科杂志, no. 18, 20 September 2008 (2008-09-20) * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114863020A (en) * | 2022-04-29 | 2022-08-05 | 北京工业大学 | Three-dimensional model construction method and device, electronic equipment and storage medium |
CN114818224A (en) * | 2022-05-27 | 2022-07-29 | 中国空气动力研究与发展中心计算空气动力研究所 | Structural grid generation method, device, equipment and storage medium |
CN115330985A (en) * | 2022-07-25 | 2022-11-11 | 埃洛克航空科技(北京)有限公司 | Data processing method and device for three-dimensional model optimization |
CN115330985B (en) * | 2022-07-25 | 2023-09-08 | 埃洛克航空科技(北京)有限公司 | Data processing method and device for three-dimensional model optimization |
CN115761137A (en) * | 2022-11-24 | 2023-03-07 | 之江实验室 | High-precision curved surface reconstruction method and device based on mutual fusion of normal vector and point cloud data |
CN115761137B (en) * | 2022-11-24 | 2023-12-22 | 之江实验室 | High-precision curved surface reconstruction method and device based on mutual fusion of normal vector and point cloud data |
CN117456144A (en) * | 2023-11-10 | 2024-01-26 | 中国人民解放军海军航空大学 | Target building three-dimensional model optimization method based on visible light remote sensing image |
CN117456144B (en) * | 2023-11-10 | 2024-05-07 | 中国人民解放军海军航空大学 | Target building three-dimensional model optimization method based on visible light remote sensing image |
CN117726760A (en) * | 2024-02-07 | 2024-03-19 | 之江实验室 | Training method and device for three-dimensional human body reconstruction model of video |
CN117726760B (en) * | 2024-02-07 | 2024-05-07 | 之江实验室 | Training method and device for three-dimensional human body reconstruction model of video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114373056A (en) | Three-dimensional reconstruction method and device, terminal equipment and storage medium | |
WO2021109876A1 (en) | Image processing method, apparatus and device, and storage medium | |
Shen et al. | Depth-aware image seam carving | |
WO2021249091A1 (en) | Image processing method and apparatus, computer storage medium, and electronic device | |
CN104732479B (en) | Resizing an image | |
US20090309877A1 (en) | Soft shadow rendering | |
CN110335330A (en) | Image simulation generation method and its system, deep learning algorithm training method and electronic equipment | |
US11205301B2 (en) | Systems and methods for optimizing a model file | |
CN116385619B (en) | Object model rendering method, device, computer equipment and storage medium | |
CN115713585B (en) | Texture image reconstruction method, apparatus, computer device and storage medium | |
CN112927200B (en) | Intrinsic image decomposition method and device, readable storage medium and electronic equipment | |
CN115375847A (en) | Material recovery method, three-dimensional model generation method and model training method | |
US9734579B1 (en) | Three-dimensional models visual differential | |
CN116363329B (en) | Three-dimensional image generation method and system based on CGAN and LeNet-5 | |
CN115830091B (en) | Texture image generation method, device, equipment, storage medium and product | |
KR102559691B1 (en) | Method and device for reconstructing neural rendering-based geometric color integrated 3D mesh | |
CN117152330B (en) | Point cloud 3D model mapping method and device based on deep learning | |
CN109191556B (en) | Method for extracting rasterized digital elevation model from LOD paging surface texture model | |
CN112215750A (en) | Three-dimensional reconstruction method based on RGB-D data | |
CN117237570A (en) | Virtual head avatar construction method and device, electronic equipment and storage medium | |
CN117557714A (en) | Three-dimensional reconstruction method, electronic device and readable storage medium | |
CN111080764A (en) | Method, device and equipment for synthesizing body texture | |
CN117011491A (en) | Industrial equipment three-dimensional reconstruction method based on sparse voxels and spherical harmonics | |
US9129447B2 (en) | Method and device for generating graphic images | |
CN115953549A (en) | Three-dimensional human body reconstruction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |