CN113822825B - Optical building target three-dimensional reconstruction method based on 3D-R2N2 - Google Patents
Optical building target three-dimensional reconstruction method based on 3D-R2N2 Download PDFInfo
- Publication number
- CN113822825B CN113822825B CN202111409413.1A CN202111409413A CN113822825B CN 113822825 B CN113822825 B CN 113822825B CN 202111409413 A CN202111409413 A CN 202111409413A CN 113822825 B CN113822825 B CN 113822825B
- Authority
- CN
- China
- Prior art keywords
- layer
- input end
- convolution layer
- convolution
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 230000004913 activation Effects 0.000 claims description 23
- 230000007704 transition Effects 0.000 claims description 22
- 238000011176 pooling Methods 0.000 claims description 14
- 238000009795 derivation Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 4
- 241001270131 Agaricus moelleri Species 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims 3
- 238000012549 training Methods 0.000 abstract description 4
- 230000000007 visual effect Effects 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 37
- 238000013527 convolutional neural network Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012905 input function Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/344—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a three-dimensional reconstruction method of an optical building target based on 3D-R2N2, which relates to the technical field of three-dimensional reconstruction, and comprises the steps of obtaining an optical image and preprocessing the optical image; constructing a 3D-R2N2 network, and inputting the preprocessed optical image into the constructed 3D-R2N2 network; wherein the 3D-R2N2 network includes a CNN module; carrying out feature extraction and coding on the optical image, and processing the optical image into a low-dimensional feature vector; sending the low-dimensional feature vector into a 3D-LSTM unit to obtain a three-dimensional grid structure; wherein the three-dimensional grid structure comprises voxels; inputting the three-dimensional grid structure into a decoder, and converting the voxel into a three-dimensional probability matrix; and performing pixel reconstruction through the three-dimensional probability matrix, namely completing the three-dimensional reconstruction of the optical building target. The method can stabilize the training model, reduce the convergence, improve the accuracy of the reconstructed model, recover more accurate images and enable the images to achieve good visual effect.
Description
Technical Field
The invention relates to the technical field of three-dimensional reconstruction, in particular to a 3D-R2N 2-based optical building target three-dimensional reconstruction method.
Background
Three-dimensional reconstruction refers to the creation of a mathematical model of a target object that is suitable for computer representation and processing. With the development of modern science and technology, building three-dimensional reconstruction technology attracts much attention, the construction of three-dimensional models becomes one of the key elements of urban geospatial data frames, and how to quickly, automatically and accurately construct three-dimensional models of urban areas, especially various buildings with complex shapes, is a hot problem in research in various fields at present. In 2015, three-dimensional reconstruction networks 3D renderets based on voxel representation were first proposed, but the networks have matching problems of texture defects, specular reflection, baselines and the like. In 2016, a 3D-R2N2 method is proposed, which mainly solves the problem of object feature matching, but the reconstruction accuracy and efficiency of the method are not high; providing a WarpNet network framework based on a convolutional neural network, realizing reconstruction with quality similar to that of a supervision method, and reconstructing target distortion by using the method; the MarrNet model for performing end-to-end training on a real image has the problems of complex calculation, lack of finer geometric shapes and the like. In 2018, images containing complex objects are reconstructed in three dimensions by using a voxel-level reconstruction algorithm, but for images with low resolution, the reconstruction accuracy of the method is low. In 2017, a B-Rep algorithm is adopted for three-dimensional reconstruction, and the algorithm is a polyhedron-oriented three-dimensional reconstruction algorithm and is only suitable for simple polyhedrons. The problems of low modeling efficiency, poor model visual effect, low modeling precision of texture missing areas and the like exist in the traditional three-dimensional reconstruction, so that higher requirements are put forward on a reconstruction algorithm.
Disclosure of Invention
Aiming at the defects in the prior art, the optical building target three-dimensional reconstruction method based on 3D-R2N2 solves the problems of low modeling efficiency, poor model visual effect and low modeling precision of texture missing areas in the prior art.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the method for three-dimensional reconstruction of the optical building target based on the 3D-R2N2 is provided, and comprises the following steps:
s1, acquiring an optical image and preprocessing the optical image;
s2, constructing a 3D-R2N2 network, and inputting the preprocessed optical image into the constructed 3D-R2N2 network; the 3D-R2N2 network comprises an image extraction module, a pyramid pooling layer, a CNN module and a 3D-LSTM unit which are connected in sequence;
s3, adjusting the size of the input optical image to be a uniform size through the pyramid pooling layer;
s4, extracting the features of the optical image with uniform size by using a CNN module and depth residual variation of a 3D-R2N2 network, and encoding the extracted features;
s5, performing one-dimensional convolution on the coded features, and compressing the features into 1024-dimensional feature vectors, namely low-dimensional feature vectors, through a coder;
s6, sending the low-dimensional feature vector into a 3D-LSTM unit to obtain a three-dimensional grid structure; wherein the three-dimensional grid structure comprises voxels;
s7, inputting the three-dimensional grid structure into a decoder, and improving the hidden state resolution of the three-dimensional grid structure through the decoder until the target output resolution is reached;
s8, converting the three-dimensional grid structure reaching the target output resolution into the existence probability of the voxel at the voxel coordinate point by using a cross entropy loss function, and processing the probability into a Bernoulli distribution form;
s9, establishing the existence probability of the voxels in the Bernoulli distribution form in the voxel coordinate points into a three-dimensional probability matrix;
and S10, performing pixel reconstruction through the three-dimensional probability matrix, namely completing the three-dimensional reconstruction of the optical building target.
Further, the specific method of preprocessing in step S1 is:
s1-1, according to the formula:
obtaining the minimum total variation(ii) a WhereinIs a pair ofThe differential is obtained by the differential analysis,is a domain of definition of a pixel point,for the original sharp analog noise high frequency image,for simulating pixels in noisy high-frequency images with original definitionxThe coordinates of the position of the object to be imaged,for simulating pixels in noisy high-frequency images with original definitionyCoordinates of the two components,is composed ofThe differential operator of (a), i.e. the optical image,as coordinates to pixel pointsxThe derivation is carried out by the derivation,as coordinates to pixel pointsyDerivation is carried out;
s1-2, carrying out noise reduction processing on the optical image by utilizing the minimized total variation;
s1-3, processing the optical image after noise reduction into a vertical image, and graying the vertical image to obtain a gray image;
s1-4, extending all the area spaces with concentrated gray scales in the gray scale image to all the gray scale area space ranges to obtain a non-uniform extension and stretching gray scale image;
and S1-5, redistributing the pixel values of the non-uniform extension stretched gray-scale image to finish preprocessing.
Further, in step S2, the CNN module includes 12 convolutional layers, 5 residual connection layers, 4 bottleneck layers, and 1 transition Layer, where the residual connection layers include a first convolutional Layer, a first leak _ Relu activation function Layer, a second leak _ Relu activation function Layer, a third leak _ Relu activation function Layer, and a PC Layer path control Layer, the first bottleneck Layer to the fourth bottleneck Layer each include a BN normalization Layer and a ReLU activation function Layer, and the first convolutional Layer has a 7 × 7 structure, and the second convolutional Layer to the thirteenth convolutional Layer each have a 3 × 3 structure;
the first convolution Layer, the first Leaky _ Relu activation function Layer, the second convolution Layer, the third convolution Layer, the fourth convolution Layer, the fifth convolution Layer, the sixth convolution Layer, the seventh convolution Layer, the eighth convolution Layer, the second Leaky _ Relu activation function Layer, the ninth convolution Layer, the first bottleneck Layer, the tenth convolution Layer, the second bottleneck Layer, the eleventh convolution Layer, the third bottleneck Layer, the twelfth convolution Layer, the fourth bottleneck Layer, the thirteenth convolution Layer, the transition Layer, the third Leaky _ Relu activation function Layer and the PC Layer access control Layer are connected in sequence; wherein the first convolution Layer is an input Layer, and the PC Layer channel control Layer is an output Layer;
the output end of the second convolution layer is respectively connected with the input end of the third convolution layer, the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the third convolution layer is respectively connected with the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fourth convolution layer is respectively connected with the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fifth convolution layer is respectively connected with the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the sixth convolution layer is connected with the input end of the seventh convolution layer;
the output end of the ninth convolution layer is respectively connected with the input end of the first bottleneck layer, the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the tenth convolution layer is respectively connected with the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the eleventh convolution layer is respectively connected with the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the twelfth convolution layer is respectively connected with the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the thirteenth convolution layer is connected to the input end of the transition layer.
Further, the 3D-LSTM unit in step S2 includes several basic modules; the basic module comprises a first current full-connection layer, and the output end of the first current full-connection layer is respectively connected with the input end of the first full-connection layer, the input end of the second full-connection layer and the input end of the third full-connection layer; the output end of the first full connection layer is connected with the output end of the first 3 multiplied by 3 convolution layer and is connected with the input end of the first adder; the output end of the second full connection layer is connected with the output end of the second 3 multiplied by 3 convolution layer and is connected with the input end of the second adder; the output end of the third full connection layer is connected with the output end of the third 3 multiplied by 3 convolutional layer and is connected with the input end of the third adder; the output end of the second adder and the output end of the third adder are respectively connected with the input end of the first multiplier, and the output end of the first multiplier is connected with the input end of the fourth adder; the input end of the first 3 × 3 × 3 convolutional layer, the input end of the second 3 × 3 × 3 convolutional layer and the input end of the third 3 × 3 × 3 convolutional layer are respectively connected with the output end of the hidden layer at the first previous moment; the output end of the first adder is respectively connected with the input end of the second multiplier and the storage unit at the previous moment; the output end of the second multiplier is connected with the output end of the fourth adder; the output end of the fourth adder is respectively connected with the hidden layer at the current moment and the storage unit at the first current moment.
Further, the pyramid pooling layers in step S2 include a feature map 16-block division layer, a feature map 4-block division layer, and a feature map 1-block division layer.
Further, the decoder in step S7 includes several sets of two-dimensional vector matrices, a pooling layer, a one-dimensional vector matrix, a full-link layer, and a Softmax activation layer.
Further, the formula of the cross entropy loss function in step S8 is:
wherein,in order to be a function of the cross-entropy loss,is a coordinate of a pixel, and is,in the form of a real voxel point,in the form of a voxel probability that is,is a logarithmic function with a base 10.
The invention has the beneficial effects that:
1. designing a CNN module with dense links, wherein each convolution layer of the CNN module is connected with a subsequent convolution layer, so that each convolution layer can increase the feature mapping of the corresponding layer, the subsequent convolution layer can acquire the information of the previous convolution layer, and the information of the first convolution layer can be acquired even the last convolution layer, thereby fully utilizing the number of channels and completing information transmission;
2. a bottleneck layer structure is added between convolution layers of the CNN module which is densely linked, so that the dimensionality of a convolution result can be reduced, the dimensionality increase caused by acquiring a large amount of characteristic information by each convolution layer can be relieved, the dimensionality is reduced, the training is stable, and the coding efficiency is improved;
3. a pyramid pooling layer is added between the extraction module and the CNN module, so that the size of the image can be unified, the image classification effect is improved by fusing multi-scale features, and the target recognition rate is improved;
4. the algorithm can improve the characteristic extraction effect, so that the details of an image reconstruction model are more perfect, and the accuracy is higher;
5. compared with the traditional algorithm, the method can shorten the registration time, reduce the convergence of the registration result and reduce the complexity of the algorithm.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a block diagram of a CNN module according to the present invention;
FIG. 3 is a diagram of a 3D-LSTM unit structure;
FIG. 4 is a block diagram of a decoder;
FIG. 5 is an optical image;
FIG. 6 is a modeling diagram of the B-Rep algorithm;
FIG. 7 is a modeling diagram of a prior 3D-R2N2 algorithm;
FIG. 8 is a modeling diagram of the 3D-R2N2 algorithm of the present invention;
FIG. 9 is a graph of the reconstruction result of the 3D-R2N2 algorithm of the present invention;
fig. 10 is a reconstruction result diagram of the voxel reconstruction algorithm.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in FIG. 1, the optical building target three-dimensional reconstruction method based on 3D-R2N2 comprises the following steps:
s1, acquiring an optical image and preprocessing the optical image;
s2, constructing a 3D-R2N2 network, and inputting the preprocessed optical image into the constructed 3D-R2N2 network; the 3D-R2N2 network comprises an image extraction module, a pyramid pooling layer, a CNN module and a 3D-LSTM unit which are connected in sequence;
s3, adjusting the size of the input optical image to be a uniform size through the pyramid pooling layer;
s4, extracting the features of the optical image with uniform size by using a CNN module and depth residual variation of a 3D-R2N2 network, and encoding the extracted features;
s5, performing one-dimensional convolution on the coded features, and compressing the features into 1024-dimensional feature vectors, namely low-dimensional feature vectors, through a coder;
s6, sending the low-dimensional feature vector into a 3D-LSTM unit to obtain a three-dimensional grid structure; wherein the three-dimensional grid structure comprises voxels;
s7, inputting the three-dimensional grid structure into a decoder, and improving the hidden state resolution of the three-dimensional grid structure through the decoder until the target output resolution is reached;
s8, converting the three-dimensional grid structure reaching the target output resolution into the existence probability of the voxel at the voxel coordinate point by using a cross entropy loss function, and processing the probability into a Bernoulli distribution form;
s9, establishing the existence probability of the voxels in the Bernoulli distribution form in the voxel coordinate points into a three-dimensional probability matrix;
and S10, performing pixel reconstruction through the three-dimensional probability matrix, namely completing the three-dimensional reconstruction of the optical building target.
The specific method of preprocessing in step S1 is:
s1-1, according to the formula:
obtaining the minimum total variation(ii) a WhereinIs a pair ofThe differential is obtained by the differential analysis,is a domain of definition of a pixel point,for the original sharp analog noise high frequency image,for simulating pixels in noisy high-frequency images with original definitionxThe coordinates of the position of the object to be imaged,for simulating pixels in noisy high-frequency images with original definitionyCoordinates of the two components,is composed ofThe differential operator of (a), i.e. the optical image,as coordinates to pixel pointsxThe derivation is carried out by the derivation,as coordinates to pixel pointsyDerivation is carried out;
s1-2, carrying out noise reduction processing on the optical image by utilizing the minimized total variation;
s1-3, processing the optical image after noise reduction into a vertical image, and graying the vertical image to obtain a gray image;
s1-4, extending all the area spaces with concentrated gray scales in the gray scale image to all the gray scale area space ranges to obtain a non-uniform extension and stretching gray scale image;
and S1-5, redistributing the pixel values of the non-uniform extension stretched gray-scale image to finish preprocessing.
Wherein,is in the order of gray scaleiThe number of pixels of (a) is,nis the total number of the number of pixels,Lis as follows;
according to the formula:
As shown in fig. 2, the CNN module in step S2 includes 12 convolutional layers, 5 residual connection layers, 4 bottleneck layers, and 1 transition Layer, where the residual connection layers include a first convolutional Layer, a first leak _ Relu activation function Layer, a second leak _ Relu activation function Layer, a third leak _ Relu activation function Layer, and a PC Layer path control Layer, the first bottleneck Layer to the fourth bottleneck Layer each include one BN normalization Layer and one ReLU activation function Layer, and the first convolutional Layer has a 7 × 7 structure, and the second convolutional Layer to the thirteenth convolutional Layer each have a 3 × 3 structure;
the first convolution Layer, the first Leaky _ Relu activation function Layer, the second convolution Layer, the third convolution Layer, the fourth convolution Layer, the fifth convolution Layer, the sixth convolution Layer, the seventh convolution Layer, the eighth convolution Layer, the second Leaky _ Relu activation function Layer, the ninth convolution Layer, the first bottleneck Layer, the tenth convolution Layer, the second bottleneck Layer, the eleventh convolution Layer, the third bottleneck Layer, the twelfth convolution Layer, the fourth bottleneck Layer, the thirteenth convolution Layer, the transition Layer, the third Leaky _ Relu activation function Layer and the PC Layer access control Layer are connected in sequence; wherein the first convolution Layer is an input Layer, and the PC Layer channel control Layer is an output Layer;
the output end of the second convolution layer is respectively connected with the input end of the third convolution layer, the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the third convolution layer is respectively connected with the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fourth convolution layer is respectively connected with the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fifth convolution layer is respectively connected with the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the sixth convolution layer is connected with the input end of the seventh convolution layer;
the output end of the ninth convolution layer is respectively connected with the input end of the first bottleneck layer, the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the tenth convolution layer is respectively connected with the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the eleventh convolution layer is respectively connected with the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the twelfth convolution layer is respectively connected with the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the thirteenth convolution layer is connected to the input end of the transition layer.
As shown in fig. 3, the 3D-LSTM unit in step S2 includes several basic modules; the basic module comprises a first current full-connection layer, and the output end of the first current full-connection layer is respectively connected with the input end of the first full-connection layer, the input end of the second full-connection layer and the input end of the third full-connection layer; the output end of the first full connection layer is connected with the output end of the first 3 multiplied by 3 convolution layer and is connected with the input end of the first adder; the output end of the second full connection layer is connected with the output end of the second 3 multiplied by 3 convolution layer and is connected with the input end of the second adder; the output end of the third full connection layer is connected with the output end of the third 3 multiplied by 3 convolutional layer and is connected with the input end of the third adder; the output end of the second adder and the output end of the third adder are respectively connected with the input end of the first multiplier, and the output end of the first multiplier is connected with the input end of the fourth adder; the input end of the first 3 × 3 × 3 convolutional layer, the input end of the second 3 × 3 × 3 convolutional layer and the input end of the third 3 × 3 × 3 convolutional layer are respectively connected with the output end of the hidden layer at the first previous moment; the output end of the first adder is respectively connected with the input end of the second multiplier and the storage unit at the previous moment; the output end of the second multiplier is connected with the output end of the fourth adder; the output end of the fourth adder is respectively connected with the hidden layer at the current moment and the storage unit at the first current moment.
The grid equation governing the 3D-LSTM cell is:
wherein,is composed oftThe output gate of the time of day,in order to be a sigmoid function,is a weight matrix of the output gates,in order to be an input function of the input function,is composed oftThe input of the time of day is,the matrix of transitions is hidden for the output gates,in order to perform the convolution operation,is composed oft-a hidden state at time 1,is the output gate offset;in order to input the information into the gate,is a weight matrix for the input gate,the state matrix is hidden for the input gate,is the offset of the input gate;in order to multiply the elements of the image,is composed oft-a storage unit at time 1,in order to activate the function(s),is a weight matrix of the storage unit,the state matrix is hidden for the storage unit,in order to store the cell offset,is a hidden unit.
The pyramid pooling layers in step S2 include a feature map 16-block division layer, a feature map 4-block division layer, and a feature map 1-block division layer.
As shown in fig. 4, the decoder in step S7 includes several sets of two-dimensional vector matrices, a pooling layer, a one-dimensional vector matrix, a full-link layer, and a Softmax activation layer.
The formula of the cross entropy loss function in step S8 is:
wherein,in order to be a function of the cross-entropy loss,is a coordinate of a pixel, and is,in the form of a real voxel point,in the form of a voxel probability that is,is a logarithmic function with a base 10.
In one embodiment of the invention, the cross-overlapping rate of the three-dimensional reconstruction result output by the network and the real model is evaluated by adopting IoU value, namely the evaluation criterion of the three-dimensional reconstruction precision.
Wherein,in order to predict the value of the target,the representation takes the intersection set,in order to be the true value of the value,the union set is represented as a union set,for the purpose of the index function,is the first to predict valueiThe number of the individual voxels,is a true value ofiAnd (4) each voxel.
As shown in fig. 5, fig. 6, fig. 7, fig. 8 and table 1, it can be seen that the details of the reconstruction model of the method of the present invention are more complete, i.e., the accuracy is higher; compared with other two algorithms, the IoU value of the 3D-R2N2 algorithm is higher, so that the 3D-R2N2 algorithm has a better effect on a three-position reconstruction model, the reconstruction accuracy of the 3D-R2N2 algorithm is 7.8% higher than that of a B-Rep algorithm, and the accuracy of the 3D-R2N2 algorithm is 5.3% higher than that of a 3D-R2N2 algorithm.
TABLE 1
In another embodiment of the invention, a teaching building is used as an object of the embodiment to detect the three-dimensional building target reconstruction performance of the 3D-R2N2 algorithm. The three-dimensional reconstruction of the image is performed in sequence by using the 3D-R2N2 algorithm and the voxel level reconstruction algorithm, and the obtained results are shown in FIGS. 9 and 10. Meanwhile, the registration time of the optical image data and the convergence of the registration result are simulated in sequence, the comparison result is shown in table 2, the registration time obtained by the two algorithms is in a direct proportion relation with the data scale, the data scale is increased, the registration time is increased, but the registration time of the voxel reconstruction algorithm to the image data each time is higher than that of the 3D-R2N2 algorithm, and the result analysis shows that the 3D-R2N2 algorithm can improve the speed by 3.2% when the three-dimensional reconstruction of the image is carried out compared with the voxel reconstruction algorithm. And then, the convergence of the registration result of the image data is analyzed by the two algorithms, the convergence performance of the 3D-R2N2 algorithm is stronger than that of a voxel reconstruction algorithm each time, and the 3D-R2N2 algorithm can effectively reduce the algorithm complexity by 15.1% compared with the voxel reconstruction algorithm when the three-dimensional image reconstruction is carried out.
TABLE 2
The invention designs a CNN module with compact links, each convolution layer of the CNN module is connected with the subsequent convolution layer, so that each convolution layer can increase the feature mapping of the corresponding layer, the subsequent convolution layer can acquire the information of the previous convolution layer, and the information of the first convolution layer can be acquired even the last convolution layer, thereby fully utilizing the number of channels and completing the information transmission; a bottleneck layer structure is added between convolution layers of the CNN module which is densely linked, so that the dimensionality of a convolution result can be reduced, the dimensionality increase caused by acquiring a large amount of characteristic information by each convolution layer can be relieved, the dimensionality is reduced, the training is stable, and the coding efficiency is improved; a pyramid pooling layer is added between the extraction module and the CNN module, so that the size of the image can be unified, the image classification effect is improved by fusing multi-scale features, and the target recognition rate is improved; the algorithm can improve the characteristic extraction effect, so that the details of an image reconstruction model are more perfect, and the accuracy is higher; compared with the traditional algorithm, the method can shorten the registration time, reduce the convergence of the registration result and reduce the complexity of the algorithm.
Claims (6)
1. A three-dimensional reconstruction method of an optical building target based on 3D-R2N2 is characterized by comprising the following steps:
s1, acquiring an optical image and preprocessing the optical image;
s2, constructing a 3D-R2N2 network, and inputting the preprocessed optical image into the constructed 3D-R2N2 network; the 3D-R2N2 network comprises an image extraction module, a pyramid pooling layer, a CNN module and a 3D-LSTM unit which are connected in sequence;
s3, adjusting the size of the input optical image to be a uniform size through the pyramid pooling layer;
s4, extracting the features of the optical image with uniform size by using a CNN module and depth residual variation of a 3D-R2N2 network, and encoding the extracted features;
s5, performing one-dimensional convolution on the coded features, and compressing the features into 1024-dimensional feature vectors, namely low-dimensional feature vectors, through a coder;
s6, sending the low-dimensional feature vector into a 3D-LSTM unit to obtain a three-dimensional grid structure; wherein the three-dimensional grid structure comprises voxels;
s7, inputting the three-dimensional grid structure into a decoder, and improving the hidden state resolution of the three-dimensional grid structure through the decoder until the target output resolution is reached;
s8, converting the three-dimensional grid structure reaching the target output resolution into the existence probability of the voxel at the voxel coordinate point by using a cross entropy loss function, and processing the probability into a Bernoulli distribution form;
s9, establishing the existence probability of the voxels in the Bernoulli distribution form in the voxel coordinate points into a three-dimensional probability matrix;
s10, performing pixel reconstruction through the three-dimensional probability matrix, namely completing the three-dimensional reconstruction of the optical building target;
the CNN module in step S2 includes 12 convolutional layers, 5 residual connection layers, 4 bottleneck layers, and 1 transition Layer, where the residual connection layers include a first convolutional Layer, a first leak _ Relu activation function Layer, a second leak _ Relu activation function Layer, a third leak _ Relu activation function Layer, and a PC Layer path control Layer, the first bottleneck Layer to the fourth bottleneck Layer include a BN normalization Layer and a ReLU activation function Layer, the first convolutional Layer has a 7 × 7 structure, and the second convolutional Layer to the thirteenth convolutional Layer have a 3 × 3 structure;
the first convolution Layer, the first Leaky _ Relu activation function Layer, the second convolution Layer, the third convolution Layer, the fourth convolution Layer, the fifth convolution Layer, the sixth convolution Layer, the seventh convolution Layer, the eighth convolution Layer, the second Leaky _ Relu activation function Layer, the ninth convolution Layer, the first bottleneck Layer, the tenth convolution Layer, the second bottleneck Layer, the eleventh convolution Layer, the third bottleneck Layer, the twelfth convolution Layer, the fourth bottleneck Layer, the thirteenth convolution Layer, the transition Layer, the third Leaky _ Relu activation function Layer and the PC Layer access control Layer are connected in sequence; wherein the first convolution Layer is an input Layer, and the PC Layer channel control Layer is an output Layer;
the output end of the second convolution layer is respectively connected with the input end of the third convolution layer, the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the third convolution layer is respectively connected with the input end of the fourth convolution layer, the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fourth convolution layer is respectively connected with the input end of the fifth convolution layer, the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the fifth convolution layer is respectively connected with the input end of the sixth convolution layer and the input end of the seventh convolution layer; the output end of the sixth convolution layer is connected with the input end of the seventh convolution layer;
the output end of the ninth convolution layer is respectively connected with the input end of the first bottleneck layer, the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the tenth convolution layer is respectively connected with the input end of the second bottleneck layer, the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the eleventh convolution layer is respectively connected with the input end of the third bottleneck layer, the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the twelfth convolution layer is respectively connected with the input end of the fourth bottleneck layer and the input end of the transition layer; the output end of the thirteenth convolution layer is connected to the input end of the transition layer.
2. The 3D-R2N 2-based optical building target three-dimensional reconstruction method according to claim 1, wherein the preprocessing in step S1 comprises:
s1-1, according to the formula:
obtaining the minimum total variation(ii) a WhereinIs a pair ofThe differential is obtained by the differential analysis,is a domain of definition of a pixel point,for the original sharp analog noise high frequency image,for simulating pixels in noisy high-frequency images with original definitionxThe coordinates of the position of the object to be imaged,for simulating pixels in noisy high-frequency images with original definitionyThe coordinates of the position of the object to be imaged,is composed ofThe differential operator of (a), i.e. the optical image,as coordinates to pixel pointsxThe derivation is carried out by the derivation,as coordinates to pixel pointsyDerivation is carried out;
s1-2, carrying out noise reduction processing on the optical image by utilizing the minimized total variation;
s1-3, processing the optical image after noise reduction into a vertical image, and graying the vertical image to obtain a gray image;
s1-4, extending all the area spaces with concentrated gray scales in the gray scale image to all the gray scale area space ranges to obtain a non-uniform extension and stretching gray scale image;
and S1-5, redistributing the pixel values of the non-uniform extension stretched gray-scale image to finish preprocessing.
3. The method for three-dimensional reconstruction of an optical building object based on 3D-R2N2, wherein the 3D-LSTM unit comprises several basic modules in step S2; the basic module comprises a first current full-connection layer, and the output end of the first current full-connection layer is respectively connected with the input end of the first full-connection layer, the input end of the second full-connection layer and the input end of the third full-connection layer; the output end of the first full connection layer is connected with the output end of the first 3 multiplied by 3 convolution layer and is connected with the input end of the first adder; the output end of the second full connection layer is connected with the output end of the second 3 multiplied by 3 convolution layer and is connected with the input end of the second adder; the output end of the third full connection layer is connected with the output end of the third 3 multiplied by 3 convolutional layer and is connected with the input end of the third adder; the output end of the second adder and the output end of the third adder are respectively connected with the input end of the first multiplier, and the output end of the first multiplier is connected with the input end of the fourth adder; the input end of the first 3 × 3 × 3 convolutional layer, the input end of the second 3 × 3 × 3 convolutional layer and the input end of the third 3 × 3 × 3 convolutional layer are respectively connected with the output end of the hidden layer at the first previous moment; the output end of the first adder is respectively connected with the input end of the second multiplier and the storage unit at the previous moment; the output end of the second multiplier is connected with the output end of the fourth adder; the output end of the fourth adder is respectively connected with the hidden layer at the current moment and the storage unit at the first current moment.
4. The 3D-R2N 2-based optical building object three-dimensional reconstruction method according to claim 1, wherein the pyramid pooling layers in step S2 includes a feature mapping 16-block partition layer, a feature mapping 4-block partition layer and a feature mapping 1-block partition layer.
5. The 3D-R2N 2-based optical building target three-dimensional reconstruction method according to claim 1, wherein the decoder in step S7 includes several sets of two-dimensional vector matrix, pooling layer, one-dimensional vector matrix, full-connected layer and Softmax active layer.
6. The method for three-dimensional reconstruction of an optical building object based on 3D-R2N2, wherein the formula of the cross entropy loss function in step S8 is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111409413.1A CN113822825B (en) | 2021-11-25 | 2021-11-25 | Optical building target three-dimensional reconstruction method based on 3D-R2N2 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111409413.1A CN113822825B (en) | 2021-11-25 | 2021-11-25 | Optical building target three-dimensional reconstruction method based on 3D-R2N2 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113822825A CN113822825A (en) | 2021-12-21 |
CN113822825B true CN113822825B (en) | 2022-02-11 |
Family
ID=78918240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111409413.1A Expired - Fee Related CN113822825B (en) | 2021-11-25 | 2021-11-25 | Optical building target three-dimensional reconstruction method based on 3D-R2N2 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822825B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310844B (en) * | 2023-05-18 | 2023-07-28 | 四川凯普顿信息技术股份有限公司 | Agricultural crop growth monitoring system |
CN116958455B (en) * | 2023-09-21 | 2023-12-26 | 北京飞渡科技股份有限公司 | Roof reconstruction method and device based on neural network and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101482971A (en) * | 2009-02-23 | 2009-07-15 | 公安部第一研究所 | Non-uniform correction method for compensation of low-gray scale X-ray image signal |
CN102831573A (en) * | 2012-08-14 | 2012-12-19 | 电子科技大学 | Linear stretching method of infrared image |
CN103327245A (en) * | 2013-06-07 | 2013-09-25 | 电子科技大学 | Automatic focusing method of infrared imaging system |
CN103337053A (en) * | 2013-06-13 | 2013-10-02 | 华中科技大学 | Switching non-local total variation based filtering method for image polluted by salt and pepper noise |
CN104143101A (en) * | 2014-07-01 | 2014-11-12 | 华南理工大学 | Method for automatically identifying breast tumor area based on ultrasound image |
CN105954994A (en) * | 2016-06-30 | 2016-09-21 | 深圳先进技术研究院 | Image enhancement method for lensless digital holography microscopy imaging |
CN106251315A (en) * | 2016-08-23 | 2016-12-21 | 南京邮电大学 | A kind of image de-noising method based on full variation |
CN106355561A (en) * | 2016-08-30 | 2017-01-25 | 天津大学 | TV (total variation) image noise removal method based on noise priori constraint |
CN108737298A (en) * | 2018-04-04 | 2018-11-02 | 东南大学 | A kind of SCMA blind checking methods based on image procossing |
CN112767272A (en) * | 2021-01-20 | 2021-05-07 | 南京信息工程大学 | Weight self-adaptive mixed-order fully-variable image denoising algorithm |
CN112785684A (en) * | 2020-11-13 | 2021-05-11 | 北京航空航天大学 | Three-dimensional model reconstruction method based on local information weighting mechanism |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10460511B2 (en) * | 2016-09-23 | 2019-10-29 | Blue Vision Labs UK Limited | Method and system for creating a virtual 3D model |
CN109147048B (en) * | 2018-07-23 | 2021-02-26 | 复旦大学 | Three-dimensional mesh reconstruction method by utilizing single-sheet colorful image |
-
2021
- 2021-11-25 CN CN202111409413.1A patent/CN113822825B/en not_active Expired - Fee Related
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101482971A (en) * | 2009-02-23 | 2009-07-15 | 公安部第一研究所 | Non-uniform correction method for compensation of low-gray scale X-ray image signal |
CN102831573A (en) * | 2012-08-14 | 2012-12-19 | 电子科技大学 | Linear stretching method of infrared image |
CN103327245A (en) * | 2013-06-07 | 2013-09-25 | 电子科技大学 | Automatic focusing method of infrared imaging system |
CN103337053A (en) * | 2013-06-13 | 2013-10-02 | 华中科技大学 | Switching non-local total variation based filtering method for image polluted by salt and pepper noise |
CN104143101A (en) * | 2014-07-01 | 2014-11-12 | 华南理工大学 | Method for automatically identifying breast tumor area based on ultrasound image |
CN105954994A (en) * | 2016-06-30 | 2016-09-21 | 深圳先进技术研究院 | Image enhancement method for lensless digital holography microscopy imaging |
CN106251315A (en) * | 2016-08-23 | 2016-12-21 | 南京邮电大学 | A kind of image de-noising method based on full variation |
CN106355561A (en) * | 2016-08-30 | 2017-01-25 | 天津大学 | TV (total variation) image noise removal method based on noise priori constraint |
CN108737298A (en) * | 2018-04-04 | 2018-11-02 | 东南大学 | A kind of SCMA blind checking methods based on image procossing |
CN112785684A (en) * | 2020-11-13 | 2021-05-11 | 北京航空航天大学 | Three-dimensional model reconstruction method based on local information weighting mechanism |
CN112767272A (en) * | 2021-01-20 | 2021-05-07 | 南京信息工程大学 | Weight self-adaptive mixed-order fully-variable image denoising algorithm |
Non-Patent Citations (3)
Title |
---|
"3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction";Christopher B. Choy等;《arXiv》;20160405;正文第1-17页 * |
"Deep Residual Learning for Image Recognition";Kaiming He等;《2016 IEEE Conference on Computer Vision and Pattern Recognition》;20161212;第770-778页 * |
"改进 ORB-SLAM 算法在户外离线即时导航的研究";邹倩颖等;《实验室研究与探索》;20190930;第38卷(第9期);第73-78页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113822825A (en) | 2021-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390638B (en) | High-resolution three-dimensional voxel model reconstruction method | |
CN108875813B (en) | Three-dimensional grid model retrieval method based on geometric image | |
CN109410321A (en) | Three-dimensional rebuilding method based on convolutional neural networks | |
CN112396703A (en) | Single-image three-dimensional point cloud model reconstruction method | |
CN113822825B (en) | Optical building target three-dimensional reconstruction method based on 3D-R2N2 | |
CN108921926A (en) | A kind of end-to-end three-dimensional facial reconstruction method based on single image | |
CN112818764B (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
CN113159232A (en) | Three-dimensional target classification and segmentation method | |
CN104077742B (en) | Human face sketch synthetic method and system based on Gabor characteristic | |
CN113436237B (en) | High-efficient measurement system of complicated curved surface based on gaussian process migration learning | |
CN107301643B (en) | Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms | |
CN113962858A (en) | Multi-view depth acquisition method | |
CN116958420A (en) | High-precision modeling method for three-dimensional face of digital human teacher | |
Zhu et al. | Nonlocal low-rank point cloud denoising for 3-D measurement surfaces | |
CN116721216A (en) | Multi-view three-dimensional reconstruction method based on GCF-MVSNet network | |
CN113011506B (en) | Texture image classification method based on deep fractal spectrum network | |
CN114782564A (en) | Point cloud compression method and device, electronic equipment and storage medium | |
CN112581626B (en) | Complex curved surface measurement system based on non-parametric and multi-attention force mechanism | |
CN104299201A (en) | Image reconstruction method based on heredity sparse optimization and Bayes estimation model | |
CN112215241B (en) | Image feature extraction device based on small sample learning | |
CN104917532A (en) | Face model compression method | |
CN116993760A (en) | Gesture segmentation method, system, device and medium based on graph convolution and attention mechanism | |
CN112767539B (en) | Image three-dimensional reconstruction method and system based on deep learning | |
CN116758219A (en) | Region-aware multi-view stereo matching three-dimensional reconstruction method based on neural network | |
CN110675381A (en) | Intrinsic image decomposition method based on serial structure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220211 |
|
CF01 | Termination of patent right due to non-payment of annual fee |