CN111652966B - Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle - Google Patents

Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle Download PDF

Info

Publication number
CN111652966B
CN111652966B CN202010393797.1A CN202010393797A CN111652966B CN 111652966 B CN111652966 B CN 111652966B CN 202010393797 A CN202010393797 A CN 202010393797A CN 111652966 B CN111652966 B CN 111652966B
Authority
CN
China
Prior art keywords
dimensional
depth
layer
map
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010393797.1A
Other languages
Chinese (zh)
Other versions
CN111652966A (en
Inventor
曹先彬
罗晓燕
杜文博
张旭东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHECC Data Co Ltd
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010393797.1A priority Critical patent/CN111652966B/en
Publication of CN111652966A publication Critical patent/CN111652966A/en
Application granted granted Critical
Publication of CN111652966B publication Critical patent/CN111652966B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Abstract

The invention discloses a three-dimensional reconstruction method and device based on multiple visual angles of an unmanned aerial vehicle, and belongs to the technical field of computer image processing. The method comprises the following steps: inputting multi-view two-dimensional images of an unmanned aerial vehicle aerial scene into a three-dimensional reconstruction model for processing, acquiring optimized depth maps at corresponding view angles, and fusing the optimized depth maps at all view angles to obtain a three-dimensional point cloud of the scene; and extracting a feature map of the image by using the three-dimensional reconstruction model, carrying out homography transformation and constructing a cost matrix, generating a depth probability distribution map, regressing the depth probability distribution map into an initial depth map, fusing the initial depth map with a reference map, inputting a depth residual error learning network, and optimizing the depth map. The device comprises a processor and a memory; a memory storing therein a computer program implementing the three-dimensional reconstruction method; and the processor executes the computer program to carry out scene three-dimensional reconstruction. The method and the device reduce the problems of time consumption, resource occupation and the like during the reconstruction of the three-dimensional scene, and realize the reconstruction of the three-dimensional scene with higher speed and higher accuracy.

Description

Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
Technical Field
The invention relates to the technical field of computer image processing, in particular to a method and a device for reconstructing a three-dimensional scene.
Background
With the development of information technology and the demand of real world three-dimensional scene construction, the three-dimensional reconstruction technology has been widely applied to the fields of military exploration, urban planning, virtual reality and the like. At present, in consideration of factors such as flexibility, cost, convenience, and the like, restoring a two-dimensional image into a three-dimensional scene through a visual sensor such as a camera becomes a mainstream method in academia and industry. Unmanned aerial vehicles are paid more and more attention to and are applied, and unmanned aerial vehicles have advantages such as large-scale, wide visual angle of aerial photography, but the image that its shooting was obtained is two-dimensional image only, can't obtain depth information, so also hardly be used for restoring three-dimensional scene directly.
In recent years, convolutional neural networks have shown strong capability in the field of computer vision such as two-dimensional feature extraction, and more researchers have applied neural networks to three-dimensional reconstruction tasks and achieved certain results. The surfacent proposed in the ICCV of 2017 is based on voxel three-dimensional reconstruction, and firstly, a three-dimensional scene is divided into space grids, and then whether each voxel belongs to the surface part of the scene is estimated, so that the whole scene is reconstructed. 2010 TPAMI proposes three-dimensional reconstruction based on point cloud, the method directly acts on points in a three-dimensional space, and continuously densifies and reconstructs a scene in a point form by depending on an updating strategy, but because of the method, the reconstruction process is front-back association and time sequence, and is difficult to parallelize, so that the whole reconstruction process consumes too much time.
Disclosure of Invention
In order to reduce the time consumption during the reconstruction of the three-dimensional scene at present and reduce the problems of high memory occupation of calculation during reconstruction and the like, the invention provides a three-dimensional reconstruction method and a three-dimensional reconstruction device based on multiple visual angles of an unmanned aerial vehicle, wherein the three-dimensional reconstruction method and the three-dimensional reconstruction device combine the aerial photography of the unmanned aerial vehicle and the three-dimensional reconstruction based on a convolutional neural network to realize the reconstruction of the three-dimensional scene with higher speed and higher accuracy.
The invention discloses a three-dimensional reconstruction method based on multiple visual angles of an unmanned aerial vehicle, which comprises the following steps:
acquiring a multi-view two-dimensional image under a scene to be three-dimensionally reconstructed by unmanned aerial vehicle aerial photography; selecting one of the images as a reference image;
processing the multi-view two-dimensional image as the input of a three-dimensional reconstruction model; firstly, extracting a two-dimensional characteristic graph from each two-dimensional image through a two-dimensional convolution neural network; converting the two-dimensional feature map into a plane parallel to the reference image through homography, and constructing a cost matrix by using all feature maps subjected to homography conversion; secondly, generating a depth probability distribution map by using the cost matrix through a three-dimensional convolutional neural network with a multi-scale structure, and returning the depth probability distribution map to an initial depth map through entropy operation; then fusing the initial depth map and the reference image, inputting the fused initial depth map and the reference image into a depth residual error learning network, and outputting an optimized depth map;
training the three-dimensional reconstruction model, optimizing a neural network in the three-dimensional reconstruction model, and solving a first-order norm and then summing an initial depth map and an optimized depth map with a calibrated real depth map respectively during training to serve as a loss function during training; each training sample is a multi-view two-dimensional image, and a label is a real depth map of a scene;
after the three-dimensional reconstruction model is trained, two-dimensional images under different visual angles are sequentially used as reference images, and then the two-dimensional images under multiple visual angles are input into the three-dimensional reconstruction model to obtain an optimized depth map under the visual angle corresponding to the reference images; and finally, fusing the optimized depth maps under all the visual angles to obtain the final three-dimensional point cloud of the scene.
The three-dimensional reconstruction model utilizes a convolution neural network structure with eight layers to extract the characteristics of an input two-dimensional image, the translation step length of a filter after every three layers is changed from 1 to 2, and batch normalization processing and a ReLU activation function are carried out after other layers except the last layer. The feature size after the eight-layer convolutional neural network becomes one fourth of the input image size, which corresponds to a downsampling scale of 4. Although downsampling is performed when the features are extracted, context information of the original input image is also stored in the convolutional neural network.
In the three-dimensional reconstruction model, homography transformation is to map one plane to another plane, and the operation is to connect an intermediate bridge from two dimensions to three dimensions; and after homography transformation is carried out on the feature maps of the input images under different visual angles, the feature maps are combined into a cost matrix by utilizing variance operation.
In the three-dimensional reconstruction model, the three-dimensional convolution neural network with the multi-scale structure is as follows: and (3) performing scale transformation and fusion on the feature map by each layer by using a similar structure of coding and decoding, and finally transforming the cost matrix into the probability distribution of the depth map, namely the depth probability distribution map.
In the three-dimensional reconstruction model, when an initial depth map is regressed, the depth probability distribution of each pixel is obtained through the depth probability distribution map, four depth values closest to the peak value are selected to carry out entropy calculation, the depth values and the corresponding depth value probabilities are multiplied and summed, and the depth of the pixels in the initial depth map is obtained.
In the three-dimensional reconstruction model, because the obtained initial depth map is too smooth, a reference image is introduced, the initial depth map and the reference image are fused to be used as input of 4 channels, and then a depth residual error learning network is connected to output the optimized depth map. The deep residual learning network is formed by a two-dimensional convolutional neural network of 3 layers of 32 channels and 1 layer of 1 channel, and in order to learn negative residual values, the last layer in the network does not comprise a batch normalization processing layer and a ReLU layer.
The invention relates to a multi-view three-dimensional reconstruction device based on an unmanned aerial vehicle, which comprises a processor and a memory, wherein the processor is used for processing a plurality of images; wherein, a computer program for realizing the unmanned aerial vehicle multi-view three-dimensional reconstruction method is stored in the memory; the processor executes the computer program stored in the memory to perform the three-dimensional reconstruction of the scene.
Compared with the prior art, the three-dimensional reconstruction method and the three-dimensional reconstruction device have the following advantages and positive effects:
(1) the input images during three-dimensional reconstruction are more flexible, are not limited to the double visual angles of a binocular camera, a certain number of input images are not emphasized by a three-dimensional reconstruction algorithm before imaging, and aerial images with any visual angles and any number can be used as the input for three-dimensional scene reconstruction.
(2) According to the method, the three-dimensional reconstruction task is converted into the depth map of each view angle of the unmanned aerial vehicle, and then the depth map is fused into the final three-dimensional point cloud, so that the calculated amount is reduced, and the whole scene reconstruction process is more efficient. Meanwhile, the parameter quantity is greatly reduced during model training, the training speed is higher, and the trained three-dimensional reconstruction model can be quickly obtained.
(3) The invention provides a more refined and effective three-dimensional reconstruction model structure, which utilizes the geometric relationship between the images shot from multiple visual angles and the corresponding cameras, utilizes dense matching and neural network to extract features, and introduces a three-dimensional convolution neural network model with a new coding-decoding structure, so that the scene is reconstructed, global semantic information can be introduced, the stereo matching capability is stronger, and the operation speed and the accuracy of three-dimensional scene reconstruction are improved.
Drawings
FIG. 1 is a flow chart of a three-dimensional reconstruction method of the present invention;
FIG. 2 is a block diagram of a three-dimensional convolutional neural network of the multi-scale structure of the present invention;
FIG. 3 is a two-dimensional convolutional neural network structure of the present invention;
fig. 4 is a schematic structural diagram of a system for three-dimensional reconstruction of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the three-dimensional reconstruction method based on multiple perspectives of an unmanned aerial vehicle according to the embodiment of the present invention is divided into the following seven steps for explanation.
Step 1, acquiring a multi-view two-dimensional image in a scene to be researched according to unmanned aerial vehicle aerial photography.
For a scene to be three-dimensionally reconstructed, the invention acquires two-dimensional graphs of a plurality of visual angles in the scene through unmanned aerial vehicle aerial photography. The shot images are a plurality of scene images shot at 7 different angles in the embodiment of the invention, 1 of the scene images is selected as a reference picture, and a three-dimensional scene is reconstructed according to the shot angles of the reference picture. The photographed image is sampled and cropped so that the image size becomes 640 × 512 pixels.
And 2, inputting the shot two-dimensional picture into a two-dimensional convolution neural network to extract characteristic information.
As shown in fig. 2, the embodiment of the present invention uses eight layers of convolutional neural networks to extract features from an image, where each layer has a channel (channel) value of 32, the filter size is set to 3 × 3, except for the last layer, a BN (batch normalization processing) layer and a ReLU layer are added after each layer, the filter sliding step size is set from 1 to 2 after each three layers, and the feature map size is reduced by half, so that the final feature map becomes one fourth of the original map, which corresponds to a downsampling scale of 4. Meanwhile, each group of input pictures shares network parameters in the backward propagation process. Although downsampling is performed when the features are extracted, context information of the original input image is also stored in the convolutional neural network.
And 3, performing homography transformation on the extracted two-dimensional characteristic graph to construct a cost matrix.
The homography transformation is to perform nonlinear interpolation on the extracted planar features by utilizing operations including camera functions, rotation, inversion and the like, and map one plane to another plane. The homography transformation operation is an intermediate bridge connecting two dimensions to three dimensions, and in addition, the homography transformation is differentiable, thereby facilitating end-to-end training.
In the step, feature maps of input images at different view angles are transformed into a plane parallel to a reference image through homography, and the transformed map size is (W/4) · (H/4) · D · C, wherein W, H, D, C are the width, height, depth and channel value of the feature maps of the input images respectively. And combining the graphs after homography transformation under different viewing angles into a cost matrix by using the following variance operation, wherein compared with the mean operation, the adopted variance operation can be more blended into difference information of different images, and the final reconstruction result is more accurate.
And combining the feature maps after homography transformation to obtain a cost matrix E as follows:
Figure BDA0002486580910000041
wherein, each graph corresponds to a feature matrix, N represents the number of the combined feature graphs, ViIs the ith characteristic diagram after homography transformation,
Figure BDA0002486580910000042
an average matrix of N signatures is shown.
In the step, feature maps of input images under different visual angles are constructed into a cost matrix after homography transformation, and the realization of the process is actually the principle of dense matching.
And 4, generating a depth probability distribution map by using the cost matrix obtained in the step 3 through a three-dimensional convolution neural network with a multi-scale structure.
Because there is much noise in the cost matrix, the cost matrix needs to be optimized by using a three-dimensional convolutional neural network. As shown in fig. 3, the three-dimensional convolutional neural network with a multi-scale structure utilizes a similar structure of encoding and decoding, each layer performs scale transformation and fusion on the feature map, and finally transforms the cost matrix into probability distribution of the depth map, so as to further generate the depth map.
The structure of the three-dimensional convolution neural network adopting the multi-scale structure is as follows: coding 4 levels in total, wherein the first level is a three-convolution layer of 32 channels (channels); the second level is reduced to 8 channels, and the three-layer convolution is changed into two-layer convolution; the following third and fourth levels both maintain two layers of 8-channel convolution; in addition, the filter sliding step size between each level becomes 2, so that the feature map (feature map) size becomes half after each level; meanwhile, the decoding also has 4 levels which can be regarded as the inverse process of the encoding, the first level to the third level are 8 channels and two-layer convolution, the last level is 32 channels and three-layer convolution, and in addition, the void convolution operation is adopted between every two levels, so that the size of feature map is doubled after each level is passed. Therefore, the feature map sizes between the corresponding levels of coding and decoding are kept consistent, and subsequent interlayer information fusion is facilitated.
And at each layer of the decoding part, performing information fusion on the output of the last decoding layer and the corresponding coding layer. The interlayer information fusion process of the three-dimensional convolution neural network with the multi-scale structure is as follows: from top to bottom, the top layer is connected with encoding and decoding by using convolution operation; the second layer of coding layer firstly passes through an 8-channel neural network and then reaches a decoding layer, and the decoding layer performs information fusion on the output of the upper layer and the output of the left layer; the third layer of coding layer passes through two 8-channel neural network layers and then reaches a decoding layer, similarly, the second 8-channel neural network layer of the third layer performs information fusion on the 8-channel neural network layer of the second layer and the output of the first 8-channel neural network layer on the left side, and the neural network layer of the third layer performs information fusion on the output of the second layer of decoding layer and the output of the second 8-channel neural network layer on the left side; for the fourth layer, there are three 8-channel neural network layers between the encoding layer and the decoding layer, and similarly, for the two 8-channel neural network layers and the decoding layer located in the middle of the fourth layer, information fusion is required to be performed on the outputs of the upper layer and the left side. And the last convolutional layer output is 1 channel, and then the output is converted into a depth probability distribution map by using a softmax operation. The probability of each pixel at different depth values is recorded in the depth probability distribution map, and the higher the probability value, the higher the probability value is, the higher the probability of representing the pixel at the depth value is.
And 5, returning the depth probability distribution map to an initial depth map.
In the embodiment of the invention, the depth probability distribution map is restored into the depth map by utilizing entropy operation. Although the traditional winner-eating-all algorithm can simply take the depth information at the maximum probability, the operation is not trivial and needs to be improved to some extent, and for each pixel, the depth is multiplied by the corresponding probability and summed, and the specific formula is as follows:
Figure BDA0002486580910000051
wherein F is the depth value of the pixel in the initial depth map recovered from the probability map, d is the possible depth value of each pixel, P (d) is the probability value corresponding to the depth value d, dminAnd dmaxRepresenting the minimum depth value and the maximum depth value in the probability map, respectively.
However, if all depth information is directly subjected to entropy calculation, the probability distribution of the depths of the pixels which are not matched in error cannot be concentrated in one peak value, so that the method adopts the nearest four depth values for each pixel, and performs entropy calculation according to the above formula to obtain the depth value of the initial depth image pixel. The nearest four depth values are the four depth values selected closest to the peak (maximum) depth.
And 6, optimizing the depth map obtained in the step 5 and outputting the optimized depth map.
Because the operation of the step 5 can cause the obtained depth map to be too smooth, the invention introduces a reference picture, fuses the reference picture with the result obtained in the step 5 as the input of 4 channels, and then connects a depth residual error learning network. The deep residual learning network is formed by a two-dimensional convolutional neural network of 3 layers of 32 channels and 1 layer of 1 channel, and the last layer does not include a BN layer and a ReLU layer in order to learn negative residual values.
The three-dimensional reconstruction model is integrally formed by the steps 2-6, wherein the related two-dimensional convolution neural network, the three-dimensional convolution neural network with the multi-scale structure and the deep residual error learning network need to be optimized.
And 7, training the three-dimensional reconstruction model. Respectively solving a first-order norm of the initial depth map and the optimized depth map and a calibrated real depth map, and then summing the first-order norm and the optimized depth map as a loss function during training, wherein the specific formula is as follows:
Figure BDA0002486580910000052
wherein L is a loss function, P represents a set of valid pixels in the image, P represents a pixel in P, d (P) is a depth value of the pixel P in the real depth map,
Figure BDA0002486580910000053
is the depth value of the pixel p in the initial depth map,
Figure BDA0002486580910000054
for the depth value of pixel p in the optimized depth map, | · | represents a first-order norm.
The invention needs to train the two-dimensional convolution neural network, the three-dimensional convolution neural network and the deep residual error learning network and optimize the network parameters. During model training, the real depth map d (p) of the scene is used as a label, but the point cloud data of the scene is easier to obtain under most conditions, so the point cloud data is firstly converted into a grid map by utilizing a conversion algorithm SPSR (normalized Poisson surface retrieval) provided by Kazhdan and the like, and then rendered into the real depth map of the scene according to each view angle. And when the model is trained, the loss function L is taken as an optimization target to conduct guide training, and the smaller the value of the loss function L is, the better the value is. The model parameters are continuously updated using a gradient descent algorithm until the loss function reaches a minimum value.
After the optimized three-dimensional reconstruction model is trained, in the embodiment of the invention, the unmanned aerial vehicle takes 7 images under different visual angles by aerial photography, and the steps 1-6 are executed by taking each image as a reference image in sequence to obtain depth maps under the visual angles corresponding to the reference images, so that the depth maps of 7 different angles are obtained, and the 7 depth maps are fused and converted into three-dimensional point cloud data of the finally reconstructed scene.
As shown in fig. 4, a three-dimensional reconstruction apparatus 40 according to an embodiment of the present invention includes: a processor 41 and a memory 42.
A memory 42 for storing computer programs, computer instructions, and the like; the computer program comprises a program that can perform the method shown in fig. 1 and will not be described in detail here.
The computer programs, computer instructions, etc. described above may be stored in one or more memories 42 in partitions. And the above-mentioned computer program, computer instructions, data, etc. can be called by the processor 41.
A processor 41 for executing the computer program stored in the memory 42 to implement the steps of the three-dimensional reconstruction method according to the above embodiments.
The processor 41 and the memory 42 may be separate structures or may be integrated structures integrated together. When the processor 41 and the memory 42 are separate structures, the memory 42 and the processor 41 may be coupled by a bus 43.
The functional modules for three-dimensional reconstruction of a scene implemented by the computer program stored in the memory 42 include:
the image input module is used for inputting multi-view two-dimensional images obtained by aerial photography of the unmanned aerial vehicle and selecting one of the multi-view two-dimensional images as a reference image;
inputting a multi-view two-dimensional image into a three-dimensional reconstruction model, and extracting a characteristic map of each image by a two-dimensional convolution neural network; performing homography transformation on the output feature maps, transforming each feature map into a plane parallel to the reference image according to parameters such as a conical body of the camera, and merging the feature maps after the homography transformation to obtain a cost matrix; secondly, generating a depth probability distribution map for the cost matrix by using a three-dimensional convolution neural network with a multi-scale structure, and then performing regression to the depth probability distribution map to obtain an initial depth map; fusing the initial depth map and the reference image, inputting the fused initial depth map and the reference image into a depth residual error learning network, and outputting an optimized depth map;
sequentially taking two-dimensional images shot by the unmanned aerial vehicle at different visual angles as reference images, and outputting optimized depth maps at corresponding visual angles by the three-dimensional reconstruction model;
and the three-dimensional scene output module is used for fusing the optimized depth maps under all the visual angles and outputting the final three-dimensional point cloud of the reconstructed scene.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A three-dimensional reconstruction method based on multiple visual angles of an unmanned aerial vehicle is characterized by comprising the following steps:
step 1, acquiring multi-view two-dimensional images under a scene to be three-dimensionally reconstructed through unmanned aerial vehicle aerial photography, and selecting one of the multi-view two-dimensional images as a reference image;
step 2, extracting a two-dimensional characteristic map from each two-dimensional image through a two-dimensional convolution neural network;
step 3, performing homography transformation on the extracted feature graph to convert the feature graph into a plane parallel to the reference image, and constructing a cost matrix by using the feature graph after the homography transformation;
step 4, generating a depth probability distribution map by using the cost matrix through a three-dimensional convolution neural network with a multi-scale structure;
step 5, returning the depth probability distribution map to an initial depth map by utilizing entropy operation;
in the step 5, the depth probability distribution of each pixel is obtained through the depth probability distribution map, four depth values closest to the peak value are selected to carry out entropy calculation, the depth values are multiplied by the corresponding depth value probabilities and summed, and the depth of the pixel in the initial depth map is obtained;
step 6, fusing the initial depth map and the reference image, inputting the fused initial depth map and the reference image into a depth residual error learning network, and outputting an optimized depth map;
step 7, training the two-dimensional convolutional neural network, the three-dimensional convolutional neural network and the deep residual error learning network, and optimizing network parameters; respectively solving a first-order norm of the initial depth map and the optimized depth map and a calibrated real depth map, and then summing the first-order norm and the optimized depth map as a loss function during training; each training sample is a multi-view two-dimensional image, and a label is a real depth map of a scene; after the network is trained, the two-dimensional images under different visual angles in the step 1 are sequentially used as reference images, the steps 2-6 are executed to obtain optimized depth maps under corresponding visual angles, and finally, the optimized depth maps under all the visual angles are fused to obtain the final three-dimensional point cloud of the scene.
2. The method according to claim 1, wherein in the step 2, the feature extraction is performed on the two-dimensional image by using a convolutional neural network with eight layers, the translation step size of the filter is changed from 1 to 2 after every three layers, and batch normalization processing and a ReLU activation function are added after other layers except the last layer; the feature size after eight layers of convolutional neural networks becomes one quarter of the input two-dimensional image.
3. The method according to claim 1, wherein in step 3, the feature maps corresponding to the two-dimensional images under different viewing angles are subjected to homography transformation and then merged into a cost matrix by using variance operation.
4. The method according to claim 1, wherein in the step 4, the three-dimensional convolutional neural network of the multi-scale structure comprises an encoding and decoding structure, each layer performs scale transformation and fusion on the feature map, and the cost matrix is transformed into the depth probability distribution map.
5. The method according to claim 1 or 4, wherein in the step 4, the three-dimensional convolutional neural network of the multi-scale structure comprises: the encoding part and the decoding part are provided with 4 layers from bottom to top, the first layer is a three-layer convolution layer with 32 channels, and the second layer to the fourth layer are two-layer convolution with 8 channels; adopting a hole convolution operation between each two levels, and changing the size of the characteristic graph into twice of the original size after passing through each two levels; the decoding part is regarded as the inverse process of the encoding, and the sizes of the feature maps between the corresponding levels of the encoding part and the decoding part are kept consistent; performing information fusion on the output of the previous decoding layer and the corresponding coding layer on each layer of the decoding part; the output of the last decoding layer is converted into a depth probability distribution map by a softmax operation.
6. The method according to claim 1, wherein in step 6, the deep residual learning network is formed by a two-dimensional convolutional neural network of 3-layer 32 channels and 1-layer 1 channels, and the last layer of the deep residual learning network does not include a batch normalization processing layer and a ReLU layer.
7. The device of the unmanned aerial vehicle multi-view three-dimensional reconstruction method is characterized by comprising a processor and a memory; a computer program for realizing the unmanned aerial vehicle multi-view three-dimensional reconstruction method is stored in the memory; the processor executes the computer program stored in the memory to perform the three-dimensional reconstruction of the scene.
CN202010393797.1A 2020-05-11 2020-05-11 Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle Active CN111652966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010393797.1A CN111652966B (en) 2020-05-11 2020-05-11 Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010393797.1A CN111652966B (en) 2020-05-11 2020-05-11 Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle

Publications (2)

Publication Number Publication Date
CN111652966A CN111652966A (en) 2020-09-11
CN111652966B true CN111652966B (en) 2021-06-04

Family

ID=72343695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010393797.1A Active CN111652966B (en) 2020-05-11 2020-05-11 Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle

Country Status (1)

Country Link
CN (1) CN111652966B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233228B (en) * 2020-10-28 2024-02-20 五邑大学 Unmanned aerial vehicle-based urban three-dimensional reconstruction method, device and storage medium
CN112697044B (en) * 2020-12-17 2021-11-26 北京航空航天大学 Static rigid object vision measurement method based on unmanned aerial vehicle platform
CN112750201B (en) * 2021-01-15 2024-03-29 浙江商汤科技开发有限公司 Three-dimensional reconstruction method, related device and equipment
CN112734915A (en) * 2021-01-19 2021-04-30 北京工业大学 Multi-view stereoscopic vision three-dimensional scene reconstruction method based on deep learning
CN112907463A (en) * 2021-01-28 2021-06-04 华南理工大学 Depth image error point removing method combining image semantics and three-dimensional information
CN112861747B (en) * 2021-02-22 2022-06-07 深圳大学 Cross-view image optimization method and device, computer equipment and readable storage medium
CN112950786A (en) * 2021-03-01 2021-06-11 哈尔滨理工大学 Vehicle three-dimensional reconstruction method based on neural network
CN113066165B (en) * 2021-03-19 2022-06-21 北京邮电大学 Three-dimensional reconstruction method and device for multi-stage unsupervised learning and electronic equipment
CN113393582A (en) * 2021-05-24 2021-09-14 电子科技大学 Three-dimensional object reconstruction algorithm based on deep learning
CN113284251B (en) * 2021-06-11 2022-06-03 清华大学深圳国际研究生院 Cascade network three-dimensional reconstruction method and system with self-adaptive view angle
CN115661810A (en) * 2021-08-27 2023-01-31 同方威视技术股份有限公司 Security check CT target object identification method and device
CN113962858B (en) * 2021-10-22 2024-03-26 沈阳工业大学 Multi-view depth acquisition method
CN115239915B (en) * 2022-09-21 2022-12-09 季华实验室 VR scene real-time reconstruction method and device, electronic equipment and storage medium
CN115601498A (en) * 2022-09-27 2023-01-13 内蒙古工业大学(Cn) Single image three-dimensional reconstruction method based on RealPoin3D

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network
CN110211061A (en) * 2019-05-20 2019-09-06 清华大学 List depth camera depth map real time enhancing method and device neural network based

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2978855B1 (en) * 2011-08-04 2013-09-27 Commissariat Energie Atomique METHOD AND DEVICE FOR CALCULATING A DEPTH CARD FROM A SINGLE IMAGE
CN104318569B (en) * 2014-10-27 2017-02-22 北京工业大学 Space salient region extraction method based on depth variation model
CN106485192B (en) * 2015-09-02 2019-12-06 富士通株式会社 Training method and device of neural network for image recognition
CN108416840B (en) * 2018-03-14 2020-02-18 大连理工大学 Three-dimensional scene dense reconstruction method based on monocular camera
CN109242959B (en) * 2018-08-29 2020-07-21 清华大学 Three-dimensional scene reconstruction method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network
CN110211061A (en) * 2019-05-20 2019-09-06 清华大学 List depth camera depth map real time enhancing method and device neural network based

Also Published As

Publication number Publication date
CN111652966A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN111652966B (en) Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
Li et al. A multi-scale guided cascade hourglass network for depth completion
CN110443842B (en) Depth map prediction method based on visual angle fusion
CN109255831B (en) Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
Xu et al. Structured attention guided convolutional neural fields for monocular depth estimation
CN111462329B (en) Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN113066168B (en) Multi-view stereo network three-dimensional reconstruction method and system
CN112396645B (en) Monocular image depth estimation method and system based on convolution residual learning
WO2021018163A1 (en) Neural network search method and apparatus
CN107358576A (en) Depth map super resolution ratio reconstruction method based on convolutional neural networks
CN113345082B (en) Characteristic pyramid multi-view three-dimensional reconstruction method and system
CN113538243B (en) Super-resolution image reconstruction method based on multi-parallax attention module combination
CN111127538A (en) Multi-view image three-dimensional reconstruction method based on convolution cyclic coding-decoding structure
CN111951195A (en) Image enhancement method and device
CN110930500A (en) Dynamic hair modeling method based on single-view video
CN114418030A (en) Image classification method, and training method and device of image classification model
CN115984494A (en) Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image
CN114757862B (en) Image enhancement progressive fusion method for infrared light field device
CN113096239B (en) Three-dimensional point cloud reconstruction method based on deep learning
CN112115786A (en) Monocular vision odometer method based on attention U-net
CN116342675A (en) Real-time monocular depth estimation method, system, electronic equipment and storage medium
CN112116646B (en) Depth estimation method for light field image based on depth convolution neural network
CN115565039A (en) Monocular input dynamic scene new view synthesis method based on self-attention mechanism
CN115330935A (en) Three-dimensional reconstruction method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211229

Address after: 908, block a, floor 8, No. 116, Zizhuyuan Road, Haidian District, Beijing 100089

Patentee after: ZHONGZI DATA CO.,LTD.

Address before: 100191 No. 37, Haidian District, Beijing, Xueyuan Road

Patentee before: BEIHANG University

TR01 Transfer of patent right