CN115564888A - Visible light multi-view image three-dimensional reconstruction method based on deep learning - Google Patents
Visible light multi-view image three-dimensional reconstruction method based on deep learning Download PDFInfo
- Publication number
- CN115564888A CN115564888A CN202210845580.9A CN202210845580A CN115564888A CN 115564888 A CN115564888 A CN 115564888A CN 202210845580 A CN202210845580 A CN 202210845580A CN 115564888 A CN115564888 A CN 115564888A
- Authority
- CN
- China
- Prior art keywords
- depth
- map
- depth map
- feature
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000013135 deep learning Methods 0.000 title claims abstract description 10
- 238000000691 measurement method Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 230000004913 activation Effects 0.000 claims abstract description 6
- 238000001914 filtration Methods 0.000 claims abstract description 6
- 238000010606 normalization Methods 0.000 claims abstract description 6
- 230000000007 visual effect Effects 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 4
- 230000003287 optical effect Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 238000011084 recovery Methods 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 3
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Graphics (AREA)
- Biophysics (AREA)
- Geometry (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a visible light multi-view image three-dimensional reconstruction method based on deep learning, which is improved based on an MVSNet network. The batch normalization layer and the nonlinear activation function layer in the network are replaced by the fused infilace-ABN layer, so that the occupation amount of video memory is reduced. A weighted mean measurement method based on grouping similarity is designed to reduce the dimension of the characteristic dimension of the cost body, so that a lighter-weight cost body is obtained, network parameters are compressed, and the calculation amount and the video memory consumption are reduced. Aiming at the problem that the resolution of a depth map is lower than that of an input image due to the fact that a low-scale feature map is used in an MVSNet network, a multi-scale feature map is extracted by using a feature pyramid module, and staged multi-scale iterative optimization depth estimation is designed. On the premise of ensuring the precision, the average number of depth planes of the cost body is reduced through multiple rounds of depth iteration, so that the cost body obtains higher spatial resolution, and the accuracy of depth map estimation is improved. And finally, filtering and fusing the output depth map to complete a three-dimensional scene reconstruction task.
Description
Technical Field
The invention belongs to the field of computer image processing, and relates to a method for performing three-dimensional reconstruction on a visible light multi-view image based on a deep learning method and outputting a three-dimensional point cloud.
Background
As a technology for finely restoring real world scenes, three-dimensional reconstruction plays an important role in daily life and production work of people. The concept of depth in three-dimensional reconstruction refers to the projection distance between a spatial three-dimensional point corresponding to an imaging pixel and a camera focus on an image. The depth map is a data format for recording depth information of all pixel points on one image, and the pixels in the image can be restored to a three-dimensional space according to the depth map corresponding to the image to obtain a small piece of point cloud. Enough images and enough depth maps are provided, so that a point cloud with enough density can be obtained. The MVSNet is a more classical MVS method based on deep learning, follows the idea of a plane scanning method, and has the main advantages that the feature extraction is carried out through a convolutional neural network, a high-dimensional cost body constructed based on the MVSNet keeps high-dimensional spatial structure semantic information, the cost body is subjected to regularization processing through 3D CNN, the operation speed of the method is much higher than that of the traditional method, a better processing effect is achieved on low-texture regions, and some obvious defects exist. The MVSNet abandons a pixel map and uses a feature map instead for depth estimation, a feature extraction network with a VGG structure is used, the size is continuously reduced by using multilayer convolution to extract image features of different levels, the network is subjected to down-sampling twice, the resolution of the feature map is reduced to 1/16 of that of an original image, and the constructed cost is wide and the height is only 1/4 of that of the original image. Since the width and height of the depth map are equal to those of the cost body, the area of the finally predicted depth map is only 1/16 of that of the reference image, and the edge of the target object is too smooth due to the influence of the convolution operation. In order to solve the problems of resolution reduction and edge smoothing of the depth map, the MVSNet adopts an additional 2D CNN upsampling module to perform refinement upsampling on the H/4 xW/4 initial depth map, and the process combines edge features contained in an original image to perform interpolation to finally obtain the H xW full-size depth map. Since this process is performed on the initial depth map at a two-dimensional level, the three-dimensional high-level semantic information included in the cost volume is not effectively utilized.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention discloses a feature point detection matching method based on deep learning. The invention provides a visible light multi-view image three-dimensional reconstruction method based on deep learning, which is improved based on an MVSNet network. The batch normalization layer and the nonlinear activation function layer in the network are replaced by the fused infilace-ABN layer, so that the occupation amount of video memory is reduced. A weighted mean measurement method based on grouping similarity is designed to reduce the dimension of the characteristic dimension of the cost body, so that a lighter-weight cost body is obtained, network parameters are compressed, and the calculation amount and the video memory consumption are reduced. Aiming at the problem that the resolution of a depth map is lower than that of an input image due to the fact that a low-scale feature map is used in an MVSNet network, a multi-scale feature map is extracted by using a feature pyramid module, and staged multi-scale iterative optimization depth estimation is designed. On the premise of ensuring the precision, the average number of the depth planes of the cost body is reduced through multiple rounds of depth iteration, so that the cost body obtains higher spatial resolution, and the accuracy of depth map estimation is improved. And finally, filtering and fusing the output depth map to complete a scene three-dimensional reconstruction task.
The technical route adopted by the invention is as follows:
a multi-view image three-dimensional reconstruction method based on deep learning comprises the following steps:
step 1: performing incremental SfM on an image group of a scene to be predicted, and calculating camera parameters of each image and sparse point cloud of the scene to be predicted;
step 1.1: and reading in the image group of the scene to be predicted by using a COLMAP program, performing an incremental motion recovery structure algorithm, and calculating to obtain the camera parameters of each image and the sparse point cloud of the scene to be predicted.
And 2, step: designing an improved depth estimation network based on MVSNet, inputting a scene to be predicted into the network for calculation, and obtaining a depth map and a probability map corresponding to each image;
step 2.1: and (3) adopting the same extraction process as the MVSNet feature extractor for an original image with the size of H multiplied by W, after obtaining a 32-channel high-dimensional feature map, performing multilayer convolution and two times of 2-time interpolation upsampling, after each time of interpolation upsampling, aggregating the upsampled feature map with the same resolution of the previous stage, and finally obtaining feature maps with the sizes of H/4 multiplied by W/4 multiplied by 32, H/2 multiplied by W/2 multiplied by 16 and H multiplied by W multiplied by 8.
Step 2.2: and for each adjacent visual angle, extracting a point cloud set of a common-view area of the adjacent visual angle and the reference visual angle from the sparse point cloud according to the camera parameters and the sparse point cloud of the scene obtained in the step 1. And calculating a base line angle between each point in the point cloud set and the optical centers and the main optical axes of the two image cameras, calculating a score for the point by using a piecewise Gaussian function, and adding the scores of all the points to obtain a total score representing the matching degree score between the two images.
Step 2.3: and dividing 32-channel feature bodies obtained by carrying out micro-homography transformation on feature maps extracted from adjacent visual angles into G channel groups, and calculating the similarity of each group and the channel group corresponding to the feature body of the reference visual angle by adopting an inner product mode. A similarity map of the G channel is obtained for each contiguous viewing angle. And carrying out normalized weighted mean aggregation by using the matching degree score as a weighting coefficient between the similarity mapping bodies of all adjacent visual angles, and finally obtaining the G channel cost body of the group mean measurement.
Step 2.4: and (4) uniformly setting 64 depth planes in the depth range of the whole scene by using the feature graph with the lowest scale extracted by the feature pyramid module, and constructing an H/4 multiplied by W/4 multiplied by 64 multiplied by G cost body by using the grouping similarity mean value measurement method in the step 2.3, wherein G is the number of groups. Then, the cost body is normalized by using 3D CNN to obtain a probability body, and a H/4 xW/4 coarse depth map is estimated. Wherein the batch normalization layer and the nonlinear activation function layer after each convolution layer in the 3D CNN are replaced by an Inplace-ABN layer.
Step 2.5: and 2 times of upsampling is carried out on the coarse depth map estimated in the step 2.4 by utilizing the mesoscale feature map extracted by the feature pyramid module to obtain an H/2 xW/2 upsampled depth map, the depth map is used as a prior depth curved surface, 1/128 of the scene depth range is used as an interval, and 32 equidistant relative depth surfaces are arranged in front of and behind the prior depth curved surface. After the relative depth surface is set up, a H/2 xW/2 x32 xG cost body is constructed by utilizing a grouping similarity mean value measurement method. Regularizing the cost body by using the 3D CNN module in the step 2.4 to obtain a probability body, estimating a relative depth map of H/2 xW/2, and superposing the relative depth map and a result after bilinear interpolation upsampling of the prior depth map to obtain an intermediate-level depth map of H/2 xW/2.
Step 2.6: similar to the step 2.5, performing 2-fold upsampling on the intermediate-level depth map output in the step 2.5 by using a high-scale feature map extracted by a feature pyramid module to obtain an H × W upsampled depth map, taking the depth map as a prior depth curved surface, setting up 8 equidistant relative depth planes in front and at the back of the prior depth curved surface, wherein the plane interval is 1/256 of the scene depth, constructing an H × W × 8 × G cost body by using a grouping similarity mean value measurement method, obtaining a probability body by using the regularization of the 3D CNN module in the step 2.5, estimating the relative depth map with the size of H × W, and overlapping the relative depth map with the result of bilinear interpolation upsampling of the intermediate-level depth map to obtain a final depth map.
And step 3: and filtering and fusing the depth maps of all the images according to the geometric consistency to generate three-dimensional point cloud data of the scene to be predicted.
And 4, step 4: generating scene three-dimensional point cloud data;
step 4.1: and performing threshold value screening on the depth map and the probability map obtained from each image through the probability map, performing depth filtering on pixel depth meeting the threshold value through double-view geometric consistency after the depth map and the probability map of each image are obtained, and fusing the filtered depth pixels to obtain point cloud data.
The method is suitable for three-dimensional reconstruction engineering of visible light multi-view images, such as building model reconstruction, unmanned aerial vehicle photogrammetry and the like.
Compared with the prior art, the invention has the following advantages:
(1) When the MVSNet network reconstructs an image, the consumption of video memory resources is overlarge, and the application in a high-resolution scene is greatly limited. According to the improved MVSNet method based on deep learning, the batch normalization layer and the nonlinear activation function layer in the network are replaced by the fused infilace-ABN layer, and the occupation amount of video memory is reduced.
(2) The designed weighted mean measurement method based on the grouping similarity is used for reducing the dimension of the characteristic dimension of the cost body, so that the more lightweight cost body is obtained, the network parameters are compressed, and the calculated amount and the memory consumption are reduced.
(3) Aiming at the problem that the resolution of a depth map is lower than that of an input image due to the fact that a low-scale feature map is used in an MVSNet network, a multi-scale feature map is extracted by using a feature pyramid module, and staged multi-scale iterative optimization depth estimation is designed. On the premise of ensuring the precision, the average number of the depth planes of the cost body is reduced through multiple rounds of depth iteration, so that the cost body obtains higher spatial resolution, and the accuracy of depth map estimation is improved.
Drawings
Fig. 1 is a diagram of a network architecture of the present invention.
Fig. 2 is a diagram of a characteristic pyramid network structure according to the present invention.
Fig. 3 is a flow chart of the construction of the group mean cost body according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The Network structure of this patent is as shown in fig. 1, firstly, a Feature extraction structure of the MVSNet is improved, pyramid Feature extraction is performed on an input reference view image and an adjacent view image by using a Feature Pyramid Network (FPN), so as to obtain a series of Feature maps of different scales, and then, the Feature maps are constructed into cost bodies of different scales by a mean value measurement method based on grouping similarity from low to high according to the scales. And after 3D regularization is carried out on the low-scale cost body and the pre-estimated depth map of the scale is output, the depth map is used as prior depth information, iterative correction is carried out on the depth in the high-scale cost body, and finally the depth map with the same resolution as the reference visual angle image is obtained through multi-stage multi-scale iteration.
Based on the FPN idea, the feature extraction network of the MVSNet is improved to extract a plurality of feature maps with different scales. As shown in fig. 2, firstly, the same extraction process is adopted as that of the original MVSNet feature extractor, after a 32-channel high-dimensional feature map is obtained, multilayer convolution and two times of interpolation upsampling are performed, and after each interpolation upsampling, aggregation is performed with the same-resolution feature map of the previous stage, so that three feature maps with different scales are finally obtained. For an original image with the size of H multiplied by W, the sizes of the FPN output feature maps are H/4 multiplied by W/4 multiplied by 32, H/2 multiplied by W/2 multiplied by 16 and H multiplied by W multiplied by 8 respectively, and the original image with the size of H multiplied by W is aggregated with high-level semantic features and is used for constructing cost bodies at different stages respectively.
Based on a grouping mean value measurement mechanism used in a binocular matching task, a multi-view image depth estimation network is improved to replace an original measurement method based on variance, a mean value measurement method based on grouping similarity is used for constructing a cost body, and the specific flow is shown in fig. 3:
reference is made to the characteristic diagram of the reference viewing angleLet the characteristic map of the ith adjacent viewing angle beWill be provided withAssuming a plane d at the jth depth j The homographic projection of (A) is recorded asWill be provided withIs divided into G groupsTo is aligned withAndthe similarity between the groups is calculated according to the grouping of the characteristic channels, and the similarity of the g-th group is recorded asWherein G belongs to (0, 1,. G-1),the calculation formula of (a) is as follows:
whereinCharacteristic diagram of indicating reference visual angleThe method of (1) group g of features,to representThe group g of features of (1),<·,·>representing an inner product operation. Grouping all G groupsSimilarity between themAfter the calculation is finished, the calculation is carried out to simultaneously obtain the characteristic similarity mapping S with G channels i,j . Let the total number of depth hypothesis planes be denoted as D, knowing j ∈ (0, 1.,. D-1), and thus D between the reference image and the ith neighboring imageFeature similarity mapping S i,j Similarity character V capable of being combined as W multiplied by H multiplied by D multiplied by G i This feature is distinguished from the MVSNet feature in that it records how similar the feature map of the neighboring view angle is to the reference view angle. Thus, we distinguish from the variance-based aggregation approach that MVSNet employs for different neighboring view-angle features, for similarity feature V i And a mean value-based polymerization mode is adopted to obtain the light-weight matching cost body C. In the cost based on variance, the smaller the variance value at the depth plane d is, the higher the probability that the depth value is d is, while in the cost based on mean value, the higher the mean value at the depth plane d is, the higher the similarity at d is represented by each view angle, the higher the probability that the depth value is d is, and the aggregation formula is as follows:
the size of the cost body C is W multiplied by H multiplied by D multiplied by G, the size of the cost body can be reduced to the original G/F based on the average value measurement of the grouping similarity, G =8 is set, and compared with the original 32-channel cost body, the operation consumption of a 3D U-Net regularization link is reduced.
Claims (4)
1. A visible light multi-view image three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps:
step 1: performing incremental SfM on an image group of a scene to be predicted, and calculating to obtain camera parameters of each image and sparse point cloud of the scene to be predicted;
step 1.1: and reading in the image group of the scene to be predicted by using a COLMAP program, performing an incremental motion recovery structure algorithm, and calculating to obtain the camera parameters of each image and the sparse point cloud of the scene to be predicted.
Step 2: designing an improved depth estimation network based on MVSNet, inputting a scene to be predicted into the network for calculation, and obtaining a depth map and a probability map corresponding to each image;
step 2.1: and (3) adopting the same extraction process as the MVSNet feature extractor for an original image with the size of H multiplied by W, after obtaining a 32-channel high-dimensional feature map, performing multilayer convolution and two times of 2-time interpolation upsampling, after each time of interpolation upsampling, aggregating the upsampled feature map with the same resolution of the previous stage, and finally obtaining feature maps with the sizes of H/4 multiplied by W/4 multiplied by 32, H/2 multiplied by W/2 multiplied by 16 and H multiplied by W multiplied by 8.
Step 2.2: and for each adjacent visual angle, extracting a point cloud set of a common-view area of the adjacent visual angle and the reference visual angle from the sparse point cloud according to the camera parameters and the sparse point cloud of the scene obtained in the step 1. And calculating a base line angle between each point in the point cloud set and the optical centers and the main optical axis of the two image cameras, calculating a score for the point by using a piecewise Gaussian function, and adding the scores of all the points to obtain a total score representing the matching degree score between the two images.
Step 2.3: and dividing the 32-channel feature bodies obtained by carrying out micro-homography transformation on the feature maps extracted from the adjacent visual angles into G channel groups, and calculating the similarity of the channel groups corresponding to the feature bodies of the reference visual angles by each group in an inner product mode. A similarity map of the G channel for each adjacent viewing angle is obtained. And carrying out normalized weighted mean aggregation by using the matching degree score as a weighting coefficient between the similarity mapping bodies of all adjacent visual angles, and finally obtaining the G channel cost body of the group mean measurement.
Step 2.4: and (3) uniformly setting 64 depth planes in the depth range of the whole scene by using the feature map of the lowest scale extracted by the feature pyramid module, and constructing an H/4 multiplied by W/4 multiplied by 64 multiplied by G cost body by using the grouping similarity mean measurement method in the step 2.3, wherein G is the number of groups. Then, the cost body is normalized by using 3D CNN to obtain a probability body, and a H/4 xW/4 coarse depth map is estimated. Wherein the batch normalization layer and the nonlinear activation function layer after each convolution layer in the 3D CNN are replaced by an Inplace-ABN layer.
Step 2.5: and 2 times of upsampling is carried out on the coarse depth map estimated in the step 2.4 by utilizing the mesoscale feature map extracted by the feature pyramid module to obtain an H/2 xW/2 upsampled depth map, the depth map is used as a prior depth curved surface, 1/128 of the scene depth range is used as an interval, and 32 equidistant relative depth surfaces are arranged in front of and behind the prior depth curved surface. After the relative depth surface is established, a H/2 xW/2 x32 xG cost body is constructed by utilizing a grouping similarity mean measurement method. And (3) regularizing the cost body by using the 3D CNN module in the step 2.4 to obtain a probability body, estimating an H/2 xW/2 relative depth map, and superposing the relative depth map and a result obtained after bilinear interpolation upsampling of the prior depth map to obtain an H/2 xW/2 intermediate-level depth map.
Step 2.6: similar to the step 2.5, performing 2-fold upsampling on the intermediate-level depth map output in the step 2.5 by using a high-scale feature map extracted by a feature pyramid module to obtain an H × W upsampled depth map, taking the depth map as a prior depth curved surface, setting up 8 equidistant relative depth planes in front and at the back of the prior depth curved surface, wherein the plane interval is 1/256 of the scene depth, constructing an H × W × 8 × G cost body by using a grouping similarity mean value measurement method, obtaining a probability body by using the regularization of the 3D CNN module in the step 2.5, estimating the relative depth map with the size of H × W, and overlapping the relative depth map with the result of bilinear interpolation upsampling of the intermediate-level depth map to obtain a final depth map.
And step 3: and filtering and fusing the depth maps of all the images according to the geometric consistency to generate three-dimensional point cloud data of the scene to be predicted.
And 4, step 4: generating scene three-dimensional point cloud data;
step 4.1: and performing threshold value screening on the depth map and the probability map obtained from each image through the probability map, performing depth filtering on pixel depths meeting the threshold value through double-view geometric consistency after the depth map and the probability map of each image are obtained, and fusing the filtered depth pixels to obtain point cloud data.
2. The method as claimed in claim 1, wherein step 2.1 uses a feature pyramid network structure to improve the feature extraction network of the MVSNet, extracts the multi-scale image features, and replaces the batch normalization layer and the activation function layer with the Inplace-ABN layer, thereby reducing the consumption of video memory.
3. The method according to claim 1, wherein the matching-based weighted mean measure method designed in step 2.2 converts the feature map after the micro-homography transformation into the similarity mapping body of the G channel based on the grouping similarity, designs the perspective matching degree algorithm, and aggregates the similarity mapping bodies of the adjacent perspectives into the lightweight cost body based on the matching degree weighted mean.
4. The method of claim 1, wherein step 2.3 is to perform multi-stage iteration through a multi-scale mean cost body of the multi-scale feature map aggregation, to refine the depth map by increasing the spatial resolution, and finally to output the depth map and the probability map with the same size as the original image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210845580.9A CN115564888A (en) | 2022-07-18 | 2022-07-18 | Visible light multi-view image three-dimensional reconstruction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210845580.9A CN115564888A (en) | 2022-07-18 | 2022-07-18 | Visible light multi-view image three-dimensional reconstruction method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115564888A true CN115564888A (en) | 2023-01-03 |
Family
ID=84738586
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210845580.9A Pending CN115564888A (en) | 2022-07-18 | 2022-07-18 | Visible light multi-view image three-dimensional reconstruction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115564888A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116091712A (en) * | 2023-04-12 | 2023-05-09 | 安徽大学 | Multi-view three-dimensional reconstruction method and system for computing resource limited equipment |
CN117765273A (en) * | 2023-11-09 | 2024-03-26 | 北京理工大学 | Real-time stereo matching method based on multi-scale multi-category cost volume |
CN118334255A (en) * | 2024-06-14 | 2024-07-12 | 南京先维信息技术有限公司 | High-resolution image three-dimensional reconstruction method, system and medium based on deep learning |
-
2022
- 2022-07-18 CN CN202210845580.9A patent/CN115564888A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116091712A (en) * | 2023-04-12 | 2023-05-09 | 安徽大学 | Multi-view three-dimensional reconstruction method and system for computing resource limited equipment |
CN117765273A (en) * | 2023-11-09 | 2024-03-26 | 北京理工大学 | Real-time stereo matching method based on multi-scale multi-category cost volume |
CN118334255A (en) * | 2024-06-14 | 2024-07-12 | 南京先维信息技术有限公司 | High-resolution image three-dimensional reconstruction method, system and medium based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huang et al. | Deepmvs: Learning multi-view stereopsis | |
CN109377530B (en) | Binocular depth estimation method based on depth neural network | |
CN110570371B (en) | Image defogging method based on multi-scale residual error learning | |
CN107154023B (en) | Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution | |
CN113345082B (en) | Characteristic pyramid multi-view three-dimensional reconstruction method and system | |
CN115564888A (en) | Visible light multi-view image three-dimensional reconstruction method based on deep learning | |
CN111462329A (en) | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning | |
CN112132023A (en) | Crowd counting method based on multi-scale context enhanced network | |
CN108062769B (en) | Rapid depth recovery method for three-dimensional reconstruction | |
CN115205489A (en) | Three-dimensional reconstruction method, system and device in large scene | |
CN113283525B (en) | Image matching method based on deep learning | |
CN112232134B (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
CN110910437B (en) | Depth prediction method for complex indoor scene | |
CN113838191A (en) | Three-dimensional reconstruction method based on attention mechanism and monocular multi-view | |
CN111860651B (en) | Monocular vision-based semi-dense map construction method for mobile robot | |
CN111950477A (en) | Single-image three-dimensional face reconstruction method based on video surveillance | |
CN115984494A (en) | Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image | |
CN112686828B (en) | Video denoising method, device, equipment and storage medium | |
CN113284251A (en) | Cascade network three-dimensional reconstruction method and system with self-adaptive view angle | |
CN116486074A (en) | Medical image segmentation method based on local and global context information coding | |
CN112734822A (en) | Stereo matching algorithm based on infrared and visible light images | |
CN112270694B (en) | Method for detecting urban environment dynamic target based on laser radar scanning pattern | |
CN116883588A (en) | Method and system for quickly reconstructing three-dimensional point cloud under large scene | |
CN115330935A (en) | Three-dimensional reconstruction method and system based on deep learning | |
CN110889868B (en) | Monocular image depth estimation method combining gradient and texture features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |