CN114998507A - Luminosity three-dimensional reconstruction method based on self-supervision learning - Google Patents
Luminosity three-dimensional reconstruction method based on self-supervision learning Download PDFInfo
- Publication number
- CN114998507A CN114998507A CN202210634582.3A CN202210634582A CN114998507A CN 114998507 A CN114998507 A CN 114998507A CN 202210634582 A CN202210634582 A CN 202210634582A CN 114998507 A CN114998507 A CN 114998507A
- Authority
- CN
- China
- Prior art keywords
- image
- map
- model
- result
- shadow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/60—Shadow generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
- G06T15/205—Image-based rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/506—Illumination models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/60—Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2215/00—Indexing scheme for image rendering
- G06T2215/12—Shadow map, environment map
Abstract
The invention relates to a photometric stereo three-dimensional reconstruction method based on self-supervision learning, which comprises the following steps: shooting a target scene for multiple times under different illumination conditions to obtain an input image set; inputting image setsInputting the result into a photometric stereo model to obtain a rough normal map recovery result and the illumination condition of each image; inputting the input image set into a reflection map estimation model and a shadow estimation model to obtain a reflection map of a scene and a shadow map of each image; restoring the input image according to the rough normal map restoration result, the illumination condition, the reflection map and the shadow map to obtain a restored image set; training a luminosity stereo model in an automatic supervision mode according to the similarity between the input image set and the restored image set, and updating parameters of the luminosity stereo model; inputting the input image set into the luminosity three-dimensional model with updated parameters to obtain an optimized normal map recovery result and the illumination condition of each image; and performing three-dimensional reconstruction on the target scene.
Description
Technical Field
The invention belongs to the field of artificial intelligence and computer vision, relates to a photometric stereo three-dimensional reconstruction technology, and particularly relates to a photometric stereo three-dimensional reconstruction method based on self-supervision learning.
Background
The three-dimensional reconstruction technology aims to acquire the surface three-dimensional structure of a real object and express the surface three-dimensional structure into a data format which is easy to store and process by a computer, and plays an important role in the problems of the computer vision field such as automatic driving, virtual reality, digital museums, cultural relic protection and the like.
At present, a plurality of solutions to the three-dimensional reconstruction problem exist, and laser scanners and structured light three-dimensional scanners which are widely applied to real life are active three-dimensional reconstruction methods. Although the active three-dimensional reconstruction method can accurately reconstruct the three-dimensional structure of a scene, the working time is too long, and three or four hours are often required for the three-dimensional reconstruction of one scene. In addition, the active three-dimensional reconstruction method needs to emit laser to the target scene, which causes inevitable damage to the structure of the fragile object (such as cultural relic).
The photometric stereo technique aims at restoring normal information of a scene surface from a plurality of images with different illumination conditions. The algorithm does not need to interact with a target scene, and compared with data acquisition equipment of an active method, the method for acquiring the two-dimensional image is more convenient. However, the existing photometric stereo algorithm has the obvious defects: when the method is applied to an open real scene with complex material and illumination conditions, the performance of the algorithm is greatly reduced.
Disclosure of Invention
The invention provides a luminosity three-dimensional reconstruction method based on self-supervision learning, which utilizes a self-supervision training mode to carry out optimization training on a luminosity three-dimensional model and improves the adaptability of the luminosity three-dimensional model to complex and variable scene light and scene materials. In the training process of the luminosity three-dimensional model, a reflection map estimation model and a shadow estimation model are constructed to predict and estimate the distribution condition of the reflection map and the shadow of the scene, an input image is restored according to a Lambert rendering model, and a reconstruction loss function is constructed to perform self-supervision optimization training on the luminosity three-dimensional model, so that reliable and efficient open environment task operation is supported, and the three-dimensional reconstruction accuracy is improved. The invention is realized by the following technical scheme:
a luminosity three-dimensional reconstruction method based on self-supervision learning is characterized by comprising the following steps:
step one, shooting a target scene for multiple times under different illumination conditions to obtain an input image setK is the number of input images;
step two, inputting the image setInputting the data into a photometric stereo model to obtain a rough normal map recovery resultAnd each imageLight conditions ofP represents the number of pixels in one image, and K is more than or equal to 1 and less than or equal to K;
wherein, the structure of the luminosity stereo model is a twin neural network, which is respectively used for each input image I k Performing feature extraction, and fusing the K features into a global feature with a fixed size by using maximum pooling operation; respectively estimating the illumination condition of each image according to the global features and the respective features of each input image, inputting the global features into a decoder, and regressing to obtain a rough normal map recovery result of the scene;
step three, inputting the image setInputting the data into a reflection map estimation model and a shadow estimation model to obtain a reflection map of a sceneAnd shadow map of each imageThe method comprises the following steps:
(1) image collectionAll images in (1) are connected in series in channel dimension, and the result after the connection is input into a reflection map estimation model, and the model is output as a reflection map of a sceneThe estimation result of (2);
(2) restoring the coarse normal map to the resultAnd light conditionsSerially connecting in channel dimension, and recording the serial result as tensor B;
(3) respectively collecting imagesIn each image I k And a corresponding tensor B k Input to a shadow estimation model, and the shadow estimation model is output for each image I k Shadow map ofThe method of (2) is as follows:
the shadow estimation model is structured as a neural network based on a coder-decoder, and comprises two encoders, wherein each encoder consists of four convolution layers, the sizes of the convolution kernels are all 3 multiplied by 3, and the number of the convolution kernels of each layer is respectively 64, 128, 256 and 256; to input an image I k The tensor B is input into two encoders, and depth feature extraction is carried out on the two encoders respectively to obtain two depth features; a feature fusion module is arranged behind the two encoders, the feature fusion module carries out series operation on the channel dimension on the two depth features, the features after series connection are successively passed through a global pooling layer and a convolution layer with convolution kernel of 1 multiplied by 1, finally normalization is carried out through a Sigmoid function, and a channel weight matrix is output; multiplying the channel weight matrix by the serial result of the two depth features, and fusing the two depth features to obtain a final image depth feature; inputting the finally obtained depth features into a decoder to obtain a shadow mapThe estimation result of (2);
step four, restoring the result according to the rough normal mapLight conditionsReflection diagramAnd shadow mapRestoring the input image to obtain a restored image set
Step five, according to the input image setAnd restoring the image setThe similarity between the three is used for training the luminosity three-dimensional model in a self-supervision mode and updating the parameters of the luminosity three-dimensional model;
step six, inputting the image setInputting the luminosity three-dimensional model with updated parameters to obtain the optimized normal map recovery resultAnd each image I k Light conditions of
Step seven, restoring the result according to the final normal mapAnd performing three-dimensional reconstruction on the target scene.
Furthermore, in step three, the structure of the reflection map estimation model is a neural network based on a coder-decoder, the coder is composed of 6 convolutional layers, and the decoder is composed of 4 convolutional layers; a jump connection layer is arranged on each convolution layer of the decoder and connected with the shallow network.
Further, the specific method of the step four is as follows:
according to the lambertian rendering formula:
recovery of results using normal mapLight conditionsReflection diagramAnd shadow mapFor each input image I separately k Performing restoration and constructing a restoration image set
Further, the concrete method of the step five is as follows:
(1) selecting and using mean square error MSE as a measure index of similarity between images;
(2) carrying out self-supervision training on the photometric stereo model according to a reconstruction loss function, wherein the loss function is as follows:
where K and P represent the number of input images and the number of pixels in each image, respectively, and i represents the index of the pixels in the image;
(3) and training the luminosity three-dimensional model until convergence, and obtaining the optimized luminosity three-dimensional model.
The technical scheme provided by the invention has the beneficial effects that:
1. in the three-dimensional reconstruction process of the target scene, the shadow information in the scene can be actively identified and understood, so that the robustness of the photometric stereo model to the scene shadow is improved, and the precision of the three-dimensional reconstruction result is improved.
2. In the three-dimensional reconstruction process of the target scene, the adaptive capacity of the photometric stereo model to the open application scene with complicated and changeable illumination conditions and surface materials can be improved by means of self-supervision optimization training.
Drawings
FIG. 1 is a flow chart of a photometric stereo three-dimensional reconstruction method based on self-supervised learning;
FIG. 2 is a flow chart of an auto-supervised photometric stereo model;
FIG. 3 is a flow chart of the shadow estimation model of FIG. 2 according to the present invention;
FIG. 4 is a quantitative comparison of the method of the present invention with the existing optimal eight photometric stereo three-dimensional reconstruction methods.
Fig. 5 is a visual comparison of the method of the present invention and the prior optimal photometric stereo three-dimensional reconstruction method.
Detailed Description
The technical scheme of the invention is clearly and completely described below with reference to the accompanying drawings. All other embodiments obtained by those skilled in the art without creative efforts based on the technical solutions of the present invention belong to the protection scope of the present invention.
Firstly, shooting a target scene for multiple times under different illumination conditions to obtain an input image set
Shooting a target scene for multiple times under different illumination conditions to obtain an input image setThe specific method comprises the following steps:
(1) respectively obtaining K photos of a target scene under K different illumination conditions to obtain a picture set
Typically, K is 13, and a light source (e.g., a plane light) is used to actively illuminate the scene in the 1-12 o' clock direction and the front direction, respectively, and to acquire images of the scene. Note that the requirements for the direction of the light source are not strict, and it is only necessary to ensure that the image is taken under multi-angle illumination.
In order to support the training process of each neural network in the invention, the invention utilizes a rendering engine to construct a virtual scene synthesis data set with true values of all attributes of an image, and the virtual scene synthesis data set is named asTJU-Synth-AA dataset. The data set comprises 1136 virtual scenes, and provides a normal map, a reflection map and a shadow map of each scene, 100 imaging results of the scene under different illumination conditions and corresponding illumination information.
According to the invention, a three-dimensional model provided by a Sculpture 3D model data set is adopted, data with incomplete three-dimensional structure or material information are removed, and finally 142 three-dimensional models are screened out. The method uses a Unity3D rendering engine to render the screened 3D model to obtain a scene image. For each 3D model, the invention observes the model through 8 different camera viewing angles to obtain 8 different scenes. In each scene, the invention acquires the imaging result of the scene under the irradiation of 100 different directional lights. Thus, the present invention finally obtains 142 × 8 scenes, 1136 scenes each including 100 images. In addition, the GetComponent function package provided by the Unity3D rendering engine can directly acquire the scene normal map, the reflection map and the illumination condition information corresponding to each image. But the shadow information of the scene surface cannot be directly acquired in Unity 3D. In order to obtain the real value of the scene shadow map, the invention adopts the following method: after the imaging result of a certain scene under one-direction light irradiation is obtained, all the surface materials of the object in the current scene are set to be white diffuse reflection materials, then the casting Shadow (Cast Shadow) option in Unity3D is sequentially set to be on and off, and the two imaging results of the current scene under the conditions of considering the casting Shadow and not considering the casting Shadow are respectively obtained under the same light irradiation condition. The shadow map S in the scene can then be obtained by pixel-by-pixel division of the two imaging results.
Inputting image setsInputting the data into a photometric stereo model to obtain a rough normal map recovery resultAnd each image I k Light conditions ofThe specific method comprises the following steps:
(1) image collectionAll images in the image data are respectively input into a photometric stereo model, and the model output is a rough normal map recovery resultAnd each image I k Light conditions of
Description 2: construction and training process of photometric stereo model
The structure of the photometric stereo model is a twin neural network, which is used for each input image I k Feature extraction is performed and the K features are fused into a fixed-size global feature using a max-pooling operation. The network firstly estimates the illumination condition of each image according to the global features and the respective features of each input image, then inputs the global features into a decoder, and regresses to obtain the normal map prediction result of the scene.
The photometric stereo model is trained in a supervised training mode on Blbby and Sculpture and photometric stereo data sets.
Input image setInputting the data into a reflection map estimation model and a shadow estimation model to obtain a reflection map of the sceneAnd shadow map of each imageThe specific method comprises the following steps:
(4) image collectionAll images in (1) are connected in series in channel dimension, and the result after the connection is input into a reflection map estimation model, and the model is output as a reflection map of a sceneThe estimation result of (2).
Description 3: structure and training process of reflection map estimation model
The structure of the reflection map estimation model is a neural network based on a coder-decoder. Wherein, the encoder is composed of 6 convolutional layers, and the decoder is composed of 4 convolutional layers. In addition, a jump connection layer is arranged on each convolution layer of the decoder to be connected with the shallow network.
The training of the reflection map estimation model adopts a supervised training mode and trains on TJU-Synth-AA reflection map prediction data sets.
(5) Normal line diagramAnd lighting conditionsAnd (4) connecting in the channel dimension in series, and recording the serial connection result as tensor B.
Is a 3 x 1 vector of the vector,is a 3 x h x w tensor (h, w are image height and width, respectively). Firstly, the method is toReplication extension toThe same dimension then andperforming a series operation to form a post-series tensor B k (dimension 6 × h × w).
(6) Respectively collecting imagesIn each image I k And a corresponding tensor B k Input to a shadow estimation model, the model output being for each image I k Shadow map ofThe estimation result of (2).
Description 5: structure and training process of shadow estimation model
The shadow estimation model is structured as a codec-based neural network. The network comprises two encoders, each encoder consisting of four convolutional layers, the convolutional kernel sizes all being 3 × 3, the number of steps (Stride) being 2. The number of convolution kernels per layer is 64, 128, 256 and 256, respectively. Firstly, input image I k And the tensor B is input into the two encoders, and the depth features are extracted respectively to obtain two depth features. The two encoders are followed by a feature fusion module, the module firstly carries out series operation on channel dimensions on the two depth features, the features after series connection pass through a global pooling layer and a convolution layer with convolution kernel of 1 × 1, finally normalization is carried out through a Sigmoid function, and a channel weight matrix is output. The weight matrix has the functions of learning the importance difference among different channels in the training process and endowing different fusion weights to each characteristic channel according to different importance. And after the channel weight matrix is obtained, the feature fusion module multiplies the channel weight matrix by the serial result of the two depth features, and fuses the two depth features to obtain the final image feature. The resulting depth features are then input into the decoder. The decoder is composed of 4 deconvolution layers and one convolution layer, and a jump connection layer is provided after each deconvolution layer.
The shadow estimation model is trained in a supervised training mode on an TJU-Synth-AA shadow estimation data set.
Recovering results from normal mapLight conditionsReflection diagramAnd shadow mapRestoring the input image to obtain a restored image setThe specific method comprises the following steps:
(1) after the normal map recovery result is obtainedLight conditionsReflection diagramAnd shadow mapThen, according to the lambertian rendering formula:
respectively delivering each sheetInput image I k Performing synthesis restoration to further construct a restored image set
Model parameter updating of (V) photometric stereo model
From a set of input imagesAnd restoring the image setThe similarity between the three-dimensional model and the luminosity three-dimensional model is trained in an automatic supervision mode, and the specific method for updating the model parameters comprises the following steps:
(1) training the photometric stereo model with an auto-supervised strategy by minimizing the reconstruction loss function:
and updating the network parameters.
Description 6: form of reconstruction loss function
The invention adopts the widely used Mean-Square Error (MSE) as the measurement index of the image similarity, and the reconstruction Error can be written into the following form:
(VI) obtaining the final normal map recovery result
Inputting image setsInputting the luminosity three-dimensional model after updating the parameters to obtain the final normal map recovery resultThe specific method comprises the following steps:
(1) image collectionAll images in the system are connected in series on the channel dimension, the result after the series connection is input into a luminosity three-dimensional model after the self-supervision training is finished, and the model output is the recovery result of the normal map after the optimizationAnd each image I k Light conditions of
Seventhly, reconstructing the surface of the target scene according to the final normal map recovery result
Restoring the result from the final normal mapThe specific method for three-dimensional reconstruction of the surface of the target scene comprises the following steps:
(1) recovering results from normal mapThe gradient fields are computed and then integrable constrained to reconstruct the target surface. The reconstruction objective function is defined as.
Where (u, v) represents the pixel coordinates of a point in the image, Ω represents the integration region, Z represents the depth of the object surface, g the gradient field of the target scene. The above equation is generally solved using poisson's equation.
Description 7: acquisition of the depth Z of the surface of an object
Let the surface of the target scene be represented by the depth equation Z ═ f (x, y), and let (p, q) represent the gradient field at a point in the scene, then the normal vector for that point can be represented as:
on this basis, the object surface depth equation Z ═ f (x, y) can be solved using poisson's equation:
(2) and writing a visualization program by using an extended program library NumPy of a programming language Python to visually display the reconstructed surface.
The following examples are presented to demonstrate the feasibility of the method of the present invention, as described in detail below:
the method of the invention was used for validation on the DiLiGenT dataset and the dunhuang mugrotto dataset. The DiLiGenT data lump contains 10 test scenes, each containing 96 pictures imaged under different lighting conditions. In addition, the DiLiGenT dataset also provides normal Truth (Ground Truth) and ray targeting information for each scene surface. The mongao cave dataset of dunhuang contains 15 scenes, each scene containing 13 pictures imaged under different lighting conditions, the test set does not contain normal Truth (Ground Truth) and ray calibration information for the scene surface. The experiments used the Mean Angle Error (MAE) function to quantitatively evaluate the three-dimensional reconstruction of DiLiGenT datasets. For the Dunhuang Mogao Grottoes dataset, the experiment only visually shows it, and does not quantitatively evaluate it, since the dataset does not contain truth values.
The results of testing on the DiLiGenT dataset according to the present method and the prior optimal photometric stereo three-dimensional reconstruction method illustrated in fig. 4 show that: the reconstruction result of the method can obtain lower average angle error and has higher reconstruction precision, which proves the effectiveness of the method. Meanwhile, the visual reconstruction result in fig. 5 shows that compared with the existing optimal method of SDPS-Net, the method can perform more detailed reconstruction on the surface of the cultural relic, and has higher practicability.
Claims (4)
1. A luminosity three-dimensional reconstruction method based on self-supervision learning is characterized by comprising the following steps:
step one, shooting a target scene for multiple times under different illumination conditions to obtain an input image setK is the number of input images;
step two, inputting the image setInputting the data into a photometric stereo model to obtain a rough normal map recovery resultAnd each imageLight conditions ofP represents the number of pixels in one image, and K is more than or equal to 1 and less than or equal to K;
wherein, the structure of the luminosity stereo model is a twin neural network, which is respectively used for each input image I k Performing feature extraction, and fusing the K features into a global feature with a fixed size by using maximum pooling operation; respectively estimating the illumination condition of each image according to the global features and the respective features of each input image, inputting the global features into a decoder, and regressing to obtain a rough normal map recovery result of the scene;
step three, inputting the image setInputting the data into a reflection map estimation model and a shadow estimation model to obtain a reflection map of a sceneAnd shadow map of each imageThe method comprises the following steps:
(1) image collectionAll images in (1) are connected in series in channel dimension, and the result after the connection is input into a reflection map estimation model, and the model is output as a reflection map of a sceneThe estimation result of (2);
(2) restoring the coarse normal map to the resultAnd light conditionsSerially connecting in a channel dimension, and recording a serial connection result as a tensor B;
(3) respectively collecting imagesIn each image I k And a corresponding tensor B k Input to a shadow estimation model, and the shadow estimation model is output for each image I k Shadow map ofThe method of (2) is as follows:
the shadow estimation model is structured as a neural network based on a coder-decoder, and comprises two encoders, wherein each encoder is composed of four convolution layers, the sizes of the convolution kernels are all 3 multiplied by 3, the number of the convolution kernels of each layer is respectively 64, 128 and 256 and 256; to input an image I k The tensor B is input into two encoders, and depth feature extraction is carried out on the two encoders respectively to obtain two depth features; a feature fusion module is arranged behind the two encoders, and the feature fusion module is used for performing channel dimension serial operation on the two depth features, the serially connected features sequentially pass through a global pooling layer and a convolution layer with convolution kernel of 1 × 1, and finally normalization is performed through a Sigmoid function, and a channel weight matrix is output; multiplying the channel weight matrix by the serial result of the two depth features, and fusing the two depth features to obtain a final image depth feature; inputting the finally obtained depth features into a decoder to obtain a shadow mapThe estimation result of (2);
step four, restoring the result according to the rough normal mapLight conditionsReflection diagramAnd shadow mapRestoring the input image to obtain a restored image set
Step five, according to the input image setAnd restoring the image setThe similarity between the three is used for training the luminosity three-dimensional model in a self-supervision mode and updating the parameters of the luminosity three-dimensional model;
step six, inputting the image setInputting the luminosity stereo model with updated parameters to obtain the optimized normal map recovery resultAnd each image I k Light conditions of
2. The method as claimed in claim 1, wherein the reflectometry estimation model is structured as a neural network based on encoder-decoder, the encoder is composed of 6 convolutional layers, and the decoder is composed of 4 convolutional layers; a jump connection layer is arranged on each convolution layer of the decoder and connected with the shallow network.
3. The photometric stereo three-dimensional reconstruction method based on the self-supervised learning as recited in claim 1, wherein the concrete method of the fourth step is as follows:
according to the lambertian rendering formula:
4. The photometric stereo three-dimensional reconstruction method based on the self-supervised learning as recited in claim 1, wherein the concrete method of the fifth step is as follows:
(1) selecting and using mean square error MSE as a measure index of similarity between images;
(2) carrying out self-supervision training on the photometric stereo model according to a reconstruction loss function, wherein the loss function is as follows:
where K and P represent the number of input images and the number of pixels in each image, respectively, and i represents the index of the pixels in the image;
(3) and training the luminosity stereo model until convergence to obtain the optimized luminosity stereo model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210634582.3A CN114998507A (en) | 2022-06-07 | 2022-06-07 | Luminosity three-dimensional reconstruction method based on self-supervision learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210634582.3A CN114998507A (en) | 2022-06-07 | 2022-06-07 | Luminosity three-dimensional reconstruction method based on self-supervision learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114998507A true CN114998507A (en) | 2022-09-02 |
Family
ID=83033836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210634582.3A Pending CN114998507A (en) | 2022-06-07 | 2022-06-07 | Luminosity three-dimensional reconstruction method based on self-supervision learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998507A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116105632A (en) * | 2023-04-12 | 2023-05-12 | 四川大学 | Self-supervision phase unwrapping method and device for structured light three-dimensional imaging |
CN116883578A (en) * | 2023-09-06 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Image processing method, device and related equipment |
-
2022
- 2022-06-07 CN CN202210634582.3A patent/CN114998507A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116105632A (en) * | 2023-04-12 | 2023-05-12 | 四川大学 | Self-supervision phase unwrapping method and device for structured light three-dimensional imaging |
CN116883578A (en) * | 2023-09-06 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Image processing method, device and related equipment |
CN116883578B (en) * | 2023-09-06 | 2023-12-19 | 腾讯科技(深圳)有限公司 | Image processing method, device and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11257272B2 (en) | Generating synthetic image data for machine learning | |
CN112258390B (en) | High-precision microscopic virtual learning resource generation method | |
US11334762B1 (en) | Method for image analysis | |
Mayer et al. | A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation | |
CN114998507A (en) | Luminosity three-dimensional reconstruction method based on self-supervision learning | |
CN110910437B (en) | Depth prediction method for complex indoor scene | |
CN113112504A (en) | Plant point cloud data segmentation method and system | |
CN113572962A (en) | Outdoor natural scene illumination estimation method and device | |
US11875583B2 (en) | Dataset generation method for self-supervised learning scene point cloud completion based on panoramas | |
CN110633628A (en) | RGB image scene three-dimensional model reconstruction method based on artificial neural network | |
US20230419600A1 (en) | Volumetric performance capture with neural rendering | |
CN114514561A (en) | Neural light transmission | |
CN115457188A (en) | 3D rendering display method and system based on fixation point | |
CN115082254A (en) | Lean control digital twin system of transformer substation | |
Yeh et al. | Photoscene: Photorealistic material and lighting transfer for indoor scenes | |
Shinohara et al. | Point2color: 3d point cloud colorization using a conditional generative network and differentiable rendering for airborne lidar | |
CN113763231A (en) | Model generation method, image perspective determination device, image perspective determination equipment and medium | |
CN115272599A (en) | Three-dimensional semantic map construction method oriented to city information model | |
CN114332355A (en) | Weak light multi-view geometric reconstruction method based on deep learning | |
CN114663880A (en) | Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism | |
Schambach et al. | A multispectral light field dataset and framework for light field deep learning | |
CN116433822B (en) | Neural radiation field training method, device, equipment and medium | |
CN116311218A (en) | Noise plant point cloud semantic segmentation method and system based on self-attention feature fusion | |
CN116071278A (en) | Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium | |
CN115953447A (en) | Point cloud consistency constraint monocular depth estimation method for 3D target detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |