CN116310131A - Three-dimensional reconstruction method considering multi-view fusion strategy - Google Patents
Three-dimensional reconstruction method considering multi-view fusion strategy Download PDFInfo
- Publication number
- CN116310131A CN116310131A CN202310315104.0A CN202310315104A CN116310131A CN 116310131 A CN116310131 A CN 116310131A CN 202310315104 A CN202310315104 A CN 202310315104A CN 116310131 A CN116310131 A CN 116310131A
- Authority
- CN
- China
- Prior art keywords
- view
- stage
- depth
- map
- cost
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000009466 transformation Effects 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 7
- 239000004973 liquid crystal related substance Substances 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 238000013519 translation Methods 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000012795 verification Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011158 quantitative evaluation Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002542 deteriorative effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011056 performance test Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a three-dimensional reconstruction method considering a multi-View fusion strategy, which comprises the steps of respectively carrying out homography transformation and distortion on N-1 source feature images and reference feature images in a first stage to construct cost bodies, transmitting each cost body into View-Net to obtain a weight image of the View relative to the reference image, carrying out weighted fusion on the weight image and the cost bodies, and finally obtaining the cost bodies with useful information elements; performing cost body regularization through conventional 3D convolution, outputting a low-resolution depth map, initializing the depth information of the depth map for the preset depth of the next stage, and obtaining a final predicted depth map after three stages to finish three-dimensional reconstruction; the invention provides a new expression mode for multi-view cost volume fusion, improves the availability of matched pixels, reduces the interference of unmatched pixels, further obtains a more accurate depth estimation graph, and improves the integrity of three-dimensional point cloud reconstruction.
Description
Technical Field
The invention belongs to the technical field of three-dimensional reconstruction, and particularly relates to a three-dimensional reconstruction method considering a multi-view fusion strategy.
Background
With the continuous improvement of image demands and continuous rising of requirements on scene cognition, three-dimensional space vision becomes more and more important, and various scene structures and detail differences missing in a two-dimensional plane can be obtained through observation of the three-dimensional space, so that user experience is more real and reliable. The conventional three-dimensional reconstruction device can reconstruct the three-dimensional space of the picture shot by the camera, but the cost is very high, which is insufficient for supporting daily use, and the device cannot be popularized to the public.
Multi-view stereo (MVS) aims to recover 3D scene geometry from a set of RGB images using known camera poses and to obtain a 3D dense model of the real world scene from multiple images. It has many important applications such as document reconstruction, virtual reality, autopilot, defect detection, etc. The deep learning based MVS approach tends to use frontal plane scanning to evaluate the depth of the same candidate set based on the same image for each pixel, and achieves higher accuracy and integrity on many MVS benchmarks than the prior art, compared to conventional MVS approaches that utilize a manually made matching metric for image consistency checking. While learning-based MVS achieves unexpected results, there are many places to solve and optimize to further improve the quality of point cloud reconstruction. Convolutional Neural Networks (CNNs) have been widely used for three-dimensional reconstruction and a broader computer vision task, with recent multi-view stereo matching algorithms typically calculating a 3D cost volume based on a set of hypothetical depths, applying 3D convolution to the cost volume to regularize and regress the final scene depth.
Several studies have demonstrated the importance of the construction of cost volumes to depth prediction accuracy. From the prior MVS pipeline flow, N-1 source graphs and 1 reference graph are known to be input, two-view cost volumes are constructed through homography relation between each source graph and the reference graph, then N-1 two-view cost volumes are compressed into a final cost volume in a specific fusion mode, and finally 3DCNN or RNN regularization is utilized for depth prediction. One of the very important issues is the judgment of the visibility of the pixels in the view and the selection of the view. The disturbance of the extraneous view and the introduction of erroneous pixels can render verification of the final score ineffective or even negatively affected, resulting in an erroneous depth estimation.
The original MVSNet uses a view fusion strategy in the traditional sense to calculate variance distribution of different two-view cost volumes, the variances are used as weights to combine the different view cost volumes, and a plurality of variant algorithms follow the scheme. Other methods apply averaging or max pooling etc. to aggregate matching costs or use some specific implicit network layers directly to get so-called adaptive weight maps which are not reasonably theoretical to explain. Although the network may implicitly learn how to discard the unmatched pixels in the view, the unmatched pixels of the view interfere with still inevitably deteriorating the final reconstruction.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a three-dimensional reconstruction method considering a multi-view fusion strategy, so as to solve the problems that the verification of a final score becomes invalid or even brings negative influence due to the interference of irrelevant views and the transmission of wrong pixels in the existing three-dimensional reconstruction, thereby causing the error of depth estimation.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a three-dimensional reconstruction method taking into account a multi-view fusion strategy, comprising the steps of:
s1, acquiring a plurality of pictures acquired by a camera;
s2, extracting features of the pictures to obtain three different feature images;
s3, mapping the N-1 source feature images in one stage to an assumed depth plane of the reference view by adopting homography transformation;
s4, respectively constructing N1 initial cost bodies based on the N-1 source characteristic images and a reference image;
s5, performing self-adaptive weight training on the N-1 initial cost bodies through View-Net to obtain a weight graph corresponding to the initial cost bodies;
s6, carrying out weighted fusion on the weight graph and the initial cost body to obtain a cost body to be regularized;
s7, regularizing the cost body in the step S6 to obtain a probability map, and generating a depth map of the stage based on the probability map;
s8, taking the depth map of the stage in the step S7 as the initialization of the depth presetting of the second stage, and circularly executing the steps S3 to S7 to generate and obtain the depth map of the second stage;
s9, taking the depth map obtained in the second stage as an initialization of depth presetting in the third stage, and circularly executing the steps S3 to S7 to generate a final predicted depth map obtained in the third stage;
and S10, based on the final predicted depth map, completing three-dimensional reconstruction of the cost body.
Further, in step S3, mapping the N-1 source feature images in one stage onto the assumed depth plane of the reference view by using homography to obtain the implicit depth relationship between images of different perspectives:
wherein, the liquid crystal display device comprises a liquid crystal display device,is->Homography relation between the feature map of the individual view and the reference feature map with depth d; k (K) i ,R i ,t i Intrinsic camera properties, rotation and translation, respectively, of the ith view, < >>Is the principal axis of the reference camera; i, K 0 ,R 0 Extracting the residuals of the factors for the determinant respectivelyThe source view camera references, and the source view rotates and translates the external references.
Further, the step S4 specifically includes:
according to 1 reference feature map, N-1 source feature map and corresponding camera and pose parameters, by matching the first image with the second imageZhang Yuan characteristic diagram is subjected to homography distortion transformation to a preset plane of a reference camera, and N-1 initial cost bodies are obtained through construction.
Further, in step S5, N-1 initial cost volumes are subjected to self-adaptive weight training through View-Net, so that a weight graph corresponding to the initial cost volumes is obtained:
wherein V (x) is a view weight map, exp (-x) is the power of e-x.
Further, in step S6, performing weighted fusion on the weight graph and the initial cost body to obtain a cost body to be regularized, including:
wherein Vt is otal (k) K is {1,2,3}; warp n (k) Is an initial cost body after homography distortion of an nth source view and a reference view in a kth stage; view n (k) And outputting one-dimensional weight information for the nth initial cost body in the kth stage through View-Net.
Further, the average absolute difference between the true depth map and the estimated depth map is calculated by adopting the smooth L1 Loss, and the three phase losses are accumulated as a final Loss:
wherein k is a cascade stage,d (p, k) is the true depth value of the pixel p of the kth stage, which is the set of valid true pixels,/i>Is the predicted depth value of the pixel p of the kth stage.
The three-dimensional reconstruction method considering the multi-view fusion strategy provided by the invention has the following beneficial effects:
the invention provides a general aggregation network, which uses a general sub-network serving as an MVS pipeline network for training pixel-level confidence of a two-view cost body, and provides a new expression mode for multi-view cost body fusion, so as to improve the usability of matched pixels and reduce the interference of unmatched pixels, further obtain a more accurate depth map and improve the integrity of three-dimensional point cloud reconstruction.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a network architecture of the present invention.
FIG. 3 is a hypothetical depth plane for mapping N-1 source signatures to a reference view by homography transformation in accordance with the present invention.
FIG. 4 is a schematic diagram of a two-view fusion strategy according to the present invention.
FIG. 5 is a two-view weighting module according to the present invention.
Fig. 6 is a View of a DTU test set Scan1, in which a plurality of two-View cost volumes obtain corresponding fusion weights after passing through View-Net.
FIG. 7 is a plot of increased number of views versus different levels of error accuracy.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Example 1
The embodiment provides a three-dimensional reconstruction method considering a multi-view fusion strategy, which is used for solving the problem that a worsening factor is added in the cost body fusion process due to uncertainty of view selection and pixel homography distortion in the existing method, and specifically comprises the following steps with reference to fig. 1:
s1, acquiring a plurality of pictures acquired by a camera;
referring to fig. 2, the input of the present embodiment includes a plurality of captured pictures and corresponding camera parameters.
S2, extracting features of the pictures to obtain three different feature images;
in the embodiment, an FPN architecture is adopted as a feature extraction layer to respectively obtain three stage feature graphs (NxC x H x W) with different sizes;
the present embodiment assumes that the depth is uniformly sampled (e.g., 1-192 mm) from a range of depths throughout all stages. The first stage acquires image features at low resolution and constructs a cost volume through homography mapping at a predetermined depth range and larger depth interval, and the subsequent stage uses high spatial resolution, narrower depth range and smaller depth interval to obtain a finer depth prediction map.
In this embodiment, a first stage is taken as an example to describe a sub-network, wherein the 0 th stage is a reference feature map, N is a 1 source feature map, the N-1 source feature maps and the reference feature map are respectively subjected to homography transformation and distortion to construct cost bodies, and each cost body is transmitted into View-Net to obtain a weight map of the View relative to the reference map; the weight map can restrain non-matching information in the view and strengthen transformable information, and the weight map is similar to the function of an attention mechanism, and particularly, the weight map can restrain non-matching information in the view and strengthen transformable information, and the weight map can be used for solving the problems that in the prior art, the weight map is similar to the attention mechanism, and the weight map can be used for restraining non-matching information in the view, and can be used for strengthening transformable information, and particularly comprises steps S3-S6:
s3, mapping the N-1 source feature images in one stage onto an assumed depth plane of the reference view by adopting homography transformation;
referring to fig. 3, according to the epipolar principle, N-1 source feature maps are mapped onto d hypothetical planes of a reference view through homography transformation warping, thereby obtaining implicit depth relationships between images of different views.
The coordinate mapping is determined by homography:
wherein, the liquid crystal display device comprises a liquid crystal display device,is->Homography relation between the feature map of the individual view and the reference feature map with depth d; in addition, K i ,R i ,t i Intrinsic camera properties, rotation and translation, respectively, of the ith view, < >>Is the principal axis of the reference camera; the 2D feature map is then warped into the hypothetical plane of the reference camera using a differentiable homography transformation to form a plurality of two-view cost volumes.
S4, respectively constructing N-1 initial cost bodies based on the N-1 source characteristic images and a reference view;
referring to FIG. 3, 1 reference feature map and N-1 source feature map are input, and the corresponding camera and pose parameters are respectively input by the method of the first pair ofZhang Yuan characteristic diagram is subjected to homography distortion transformation to a preset plane of a reference camera, and N-1 initial cost bodies are obtained through construction.
Step S5, the weight module of the two views of the embodiment carries out self-adaptive weight training on N-1 initial cost bodies through View-Net to obtain a weight graph (1 XH XW) corresponding to the initial cost bodies;
referring to FIG. 5, the input of the view-Net module is a two-view cost volume, the number of channels is reduced to 1/2 of the original number through a Conv3d layer, and a BatchNorm3d-Relu is used as the active layer of the layer for fast convergence in the initial stage. The number of channels is then reduced to one dimension (1×d×h×w) by one Conv3D layer. The above operation can be considered here as a temporary cost volume x, and is converted into a final usefulness map (view weight map) V (x) (1×h×w) by the following formula.
Of course, to meet the coarse-to-fine training strategy, three independent View-nets are set for three different cascade stages, and specific setting parameters thereof are shown in table 1:
TABLE 1 three stage View-Net set parameters where the original resolution of the input is 512×640, the input obtained for each stage is in turn 1/4,1/2,1 times the original resolution
Step S6, referring to FIG. 4, in the two-view fusion strategy of the embodiment, weighting and fusing the weight graph and the initial cost body to obtain a cost body to be regularized;
the method adopts the method that the unsiqueeze dimension-increasing is multiplied by the cost body and added, and finally the added quantity is divided to obtain the fused final cost body, and the specific implementation formula is as follows:
wherein, warp n (k) The initial cost body after homography distortion of the nth source View and the reference View in the kth stage is View n (k) One-dimensional weight information V outputted by View-Net for the nth initial cost body in the kth stage total (k) And (3) for the fused cost body k E {1,2,3} in the kth stage.
By adopting the fusion strategy of the step, the effective matching pixel sampling under different source views can be improved, the interference of non-matching pixels is restrained, and finally, a fusion weight diagram of the source view and the reference view is obtained; and fusing all the two-view cost volumes by pixel-level weights, so as to reduce the interference information of the final cost volumes.
Step S7, regularizing the cost body in the step S6 to obtain a probability map, and generating a depth map of the stage based on the probability map;
the regularization processing in this step obtains a probability map and generates a predicted depth map in this stage based on the probability map directly by conventional means, so that specific processes are not repeated in this embodiment.
Step S8, taking the depth map of the stage in the step S7 as the initialization of the depth presetting of the second stage, and circularly executing the steps S3 to S7 to generate a predicted depth map of the second stage;
step S9, taking the depth map obtained in the second stage as the initialization of the depth presetting in the third stage, and circularly executing the steps S3 to S7 to generate a final predicted depth map obtained in the third stage;
step S10, based on the final predicted depth map, completing three-dimensional reconstruction of the cost body;
the step is based on the final predicted depth map to complete three-dimensional reconstruction of the cost body, and conventional means in the field are adopted, so that detailed processes thereof are not repeated.
In this embodiment, for three stages, namely, loss calculation is performed on three-layer cascade output results, each layer calculates an average absolute difference between a real depth map and an estimated (predicted) depth map by using a smooth L1 Loss, and three-stage losses are accumulated as a final Loss:
wherein k is a cascade stage,d (p, k) is the true depth value of the pixel p of the kth stage, which is the set of valid true pixels,/i>Is the predicted depth value of the pixel p of the kth stage.
Example 2
This example is used for performing evaluation verification of the method steps in example 1;
specifically, the embodiment trains the DTU training set on the DTU training set; in the training phase, the number of input images n=3 is set, and the image resolution is 512×640, in accordance with the conventional MVS operation. For coarse to fine regularization, the depth is assumed to be from 425 mm to 935 mm samples; the number of planar scan depth hypotheses for each stage is 48, 32 and 8, respectively; the corresponding depth interval decays 0.25 and 0.5 from the coarsest phase to the finer phase. The model was trained with Adam for 16 periods with an initial learning rate of 0.001, decayed 0.5 fold after 6, 8 and 12 periods, respectively, using a batch size of 2 on 1 ambida RTX 3090 GPU, with one batch occupying 6GB of memory.
Evaluating the proposed method on an evaluation set of the DTU dataset with an official evaluation criterion; in the evaluation phase, n=5 was set, the input resolution was 864×1152, and the quantitative evaluation thereof was as shown in table 2;
table 2DTU dataset quantitative evaluation (lower better), the method herein outperforms our baseline network and most other advanced methods in terms of integrity
From the partial multi-view stereo qualitative results on the DTU dataset of the table above, the method of the present invention is significantly denser and more complete in point cloud reconstruction.
To verify the advantages of the fusion strategy proposed by the present invention, the contribution of the strategy to MVS is visually represented, and the visibility graph after each two views pass through View-Net is visualized as shown in FIG. 6. As is evident from the figure, the contribution of the different viewing angles to the reference viewing angle is different, a lighter area symbolizes that the partial area matches more pixels with the reference view, while a lighter area implies that the partial area does not have a great relationship with the reference view. In other words, the amount of information that is focused on viewing the same object from different perspectives is not the same. So as to distribute the importance degree of homography distortion of different pixels in the reference visual angle;
from Vis-MVSNet, a good multi-view fusion strategy should be unaffected by the number of views, and not result in a drop in results due to the addition of source views. Performance tests using different numbers of source views on large and small scene data sets, respectively, are therefore performed for the proposed fusion strategy.
Verification tests were performed on the DTU dataset and the blendervs dataset, respectively, and the experimental results are shown in table 3 and fig. 7. As can be seen from experiments, with the improvement of the view quantity, the depth estimation accuracy of the method is gradually improved, and particularly, the improvement of high-accuracy ranges of 2mm, 4mm and the like is obvious.
Table 3 tests from left to right that the depth map introducing 2-8 views is smaller than the error of different precision, respectively.
Based on the verification, the filtering of the image information by the method provided by the invention is highlighted, the calculation of invalid image pixels is avoided, the estimation of correct pixels is improved, and therefore, the integrity of the reconstructed point cloud is improved.
Although specific embodiments of the invention have been described in detail with reference to the accompanying drawings, it should not be construed as limiting the scope of protection of the present patent. Various modifications and variations which may be made by those skilled in the art without the creative effort are within the scope of the patent described in the claims.
Claims (6)
1. The three-dimensional reconstruction method taking the multi-view fusion strategy into consideration is characterized by comprising the following steps of:
s1, acquiring a plurality of pictures acquired by a camera;
s2, extracting features of the pictures to obtain three different feature images;
s3, mapping the N-1 source feature images in one stage to an assumed depth plane of the reference view by adopting homography transformation;
s4, respectively constructing N-1 initial cost bodies based on the N-1 source characteristic images and a reference view;
s5, performing self-adaptive weight training on the N-1 initial cost bodies through View-Net to obtain a weight graph corresponding to the initial cost bodies;
s6, carrying out weighted fusion on the weight graph and the initial cost body to obtain a cost body to be regularized;
s7, regularizing the cost body in the step S6 to obtain a probability map, and generating a depth map of the stage based on the probability map;
s8, taking the depth map of the stage in the step S7 as the initialization of the depth presetting of the second stage, and circularly executing the steps S3 to S7 to generate and obtain the depth map of the second stage;
s9, taking the depth map obtained in the second stage as an initialization of depth presetting in the third stage, and circularly executing the steps S3 to S7 to generate a final predicted depth map obtained in the third stage;
and S10, based on the final predicted depth map, completing three-dimensional reconstruction of the cost body.
2. The three-dimensional reconstruction method considering a multi-view fusion strategy according to claim 1, wherein in the step S3, a homography transformation is adopted to map N-1 source feature images in a stage onto a hypothetical depth plane of a reference view, so as to obtain an implicit depth relationship between images of different view angles:
wherein, the liquid crystal display device comprises a liquid crystal display device,is->Homography relation between the feature map of the individual view and the reference feature map with depth d; k (K) i ,R i ,t i Intrinsic camera properties, rotation and translation, respectively, of the ith view, < >>Is the principal axis of the reference camera; i, K 0 ,R 0 Extracting the residual factors of the determinant respectively, and carrying out rotation translation external parameters of the source view by using the internal parameters of the source view camera.
3. The three-dimensional reconstruction method according to claim 2, wherein the step S4 specifically comprises:
according to 1 reference feature map, N-1 source feature map and corresponding camera and pose parameters, by matching the first image with the second imageZhang Yuan characteristic diagram is subjected to homography distortion transformation to a preset plane of a reference camera, and N-1 initial cost bodies are obtained through construction.
4. The three-dimensional reconstruction method considering the multi-View fusion strategy according to claim 1, wherein in the step S5, the adaptive weight training is performed on N-1 initial cost volumes through View-Net, so as to obtain a weight graph corresponding to the initial cost volumes:
where V (x) is the view weight map, exp (-) is the power of e.
5. The three-dimensional reconstruction method considering the multi-view fusion strategy according to claim 4, wherein the step S6 of performing weighted fusion on the weight map and the initial cost body to obtain the cost body to be regularized comprises:
wherein V is total (k) K is {1,2,3}; warp n (k) Is an initial cost body after homography distortion of an nth source view and a reference view in a kth stage; view n (k) And outputting one-dimensional weight information for the nth initial cost body in the kth stage through View-Net.
6. The three-dimensional reconstruction method considering the multi-view fusion strategy according to claim 1, wherein the average absolute difference between the true depth map and the estimated depth map is calculated by adopting a smooth L1 Loss, and three phase losses are accumulated as a final Loss:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310315104.0A CN116310131A (en) | 2023-03-28 | 2023-03-28 | Three-dimensional reconstruction method considering multi-view fusion strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310315104.0A CN116310131A (en) | 2023-03-28 | 2023-03-28 | Three-dimensional reconstruction method considering multi-view fusion strategy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116310131A true CN116310131A (en) | 2023-06-23 |
Family
ID=86816681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310315104.0A Pending CN116310131A (en) | 2023-03-28 | 2023-03-28 | Three-dimensional reconstruction method considering multi-view fusion strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116310131A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117437363A (en) * | 2023-12-20 | 2024-01-23 | 安徽大学 | Large-scale multi-view stereoscopic method based on depth perception iterator |
CN117671163A (en) * | 2024-02-02 | 2024-03-08 | 苏州立创致恒电子科技有限公司 | Multi-view three-dimensional reconstruction method and system |
-
2023
- 2023-03-28 CN CN202310315104.0A patent/CN116310131A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117437363A (en) * | 2023-12-20 | 2024-01-23 | 安徽大学 | Large-scale multi-view stereoscopic method based on depth perception iterator |
CN117437363B (en) * | 2023-12-20 | 2024-03-22 | 安徽大学 | Large-scale multi-view stereoscopic method based on depth perception iterator |
CN117671163A (en) * | 2024-02-02 | 2024-03-08 | 苏州立创致恒电子科技有限公司 | Multi-view three-dimensional reconstruction method and system |
CN117671163B (en) * | 2024-02-02 | 2024-04-26 | 苏州立创致恒电子科技有限公司 | Multi-view three-dimensional reconstruction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Dynamic spatial propagation network for depth completion | |
Wang et al. | 360sd-net: 360 stereo depth estimation with learnable cost volume | |
WO2018127007A1 (en) | Depth image acquisition method and system | |
CN103854283B (en) | A kind of mobile augmented reality Tracing Registration method based on on-line study | |
CN116310131A (en) | Three-dimensional reconstruction method considering multi-view fusion strategy | |
CN110070598B (en) | Mobile terminal for 3D scanning reconstruction and 3D scanning reconstruction method thereof | |
CN106023303B (en) | A method of Three-dimensional Gravity is improved based on profile validity and is laid foundations the dense degree of cloud | |
CN106023230B (en) | A kind of dense matching method of suitable deformation pattern | |
CN115205489A (en) | Three-dimensional reconstruction method, system and device in large scene | |
CN110956661B (en) | Method for calculating dynamic pose of visible light and infrared camera based on bidirectional homography matrix | |
CN109544628B (en) | Accurate reading identification system and method for pointer instrument | |
CN114067197B (en) | Pipeline defect identification and positioning method based on target detection and binocular vision | |
CN109859137B (en) | Wide-angle camera irregular distortion global correction method | |
CN111784778A (en) | Binocular camera external parameter calibration method and system based on linear solving and nonlinear optimization | |
CN110910456B (en) | Three-dimensional camera dynamic calibration method based on Harris angular point mutual information matching | |
CN113744337A (en) | Synchronous positioning and mapping method integrating vision, IMU and sonar | |
CN110033461B (en) | Mobile phone anti-shake function evaluation method based on target displacement estimation | |
CN113393439A (en) | Forging defect detection method based on deep learning | |
CN114119739A (en) | Binocular vision-based hand key point space coordinate acquisition method | |
CN116129037B (en) | Visual touch sensor, three-dimensional reconstruction method, system, equipment and storage medium thereof | |
CN112150518B (en) | Attention mechanism-based image stereo matching method and binocular device | |
CN115601406A (en) | Local stereo matching method based on fusion cost calculation and weighted guide filtering | |
CN113963117A (en) | Multi-view three-dimensional reconstruction method and device based on variable convolution depth network | |
CN115359127A (en) | Polarization camera array calibration method suitable for multilayer medium environment | |
CN111062900B (en) | Binocular disparity map enhancement method based on confidence fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |