CN116310098A - Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network - Google Patents

Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network Download PDF

Info

Publication number
CN116310098A
CN116310098A CN202310185694.XA CN202310185694A CN116310098A CN 116310098 A CN116310098 A CN 116310098A CN 202310185694 A CN202310185694 A CN 202310185694A CN 116310098 A CN116310098 A CN 116310098A
Authority
CN
China
Prior art keywords
depth
network
map
view
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310185694.XA
Other languages
Chinese (zh)
Inventor
周贵云
周祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202310185694.XA priority Critical patent/CN116310098A/en
Publication of CN116310098A publication Critical patent/CN116310098A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Graphics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Geometry (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a three-dimensional reconstruction method based on deep learning, which comprises the following steps: constructing a multi-scale feature extraction network, and extracting multi-scale semantic features of the acquired multi-view images to obtain target key features of multiple scales; performing pixel-level depth evaluation iterative computation in a block matching iterative model based on an attention mechanism and a deformation convolution depth network, and outputting a depth map corresponding to each image after the iterative computation is completed; extracting depth residual errors through a residual error network, and optimizing the depth map to obtain a final depth map; and filtering the depth points through geometric consistency constraint and illumination consistency constraint, carrying out depth map fusion according to a visibility fusion algorithm, and outputting to obtain the three-dimensional point cloud of the scene. According to the method and the device, the target characteristics are purified through the attention mechanism, and the depth evaluation optimization is performed based on the variable convolution depth network, so that the obtained depth map is more accurate, and the integrity and the accuracy of the three-dimensional reconstruction model are improved.

Description

Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network
Technical Field
The invention belongs to the field of geometric computer vision three-dimensional reconstruction, and relates to a multi-view three-dimensional reconstruction method based on an attention mechanism and a variable convolution depth network.
Background
In recent years, deep learning has been widely used in the field of computer vision and has been greatly successful, and has shown excellent performance in the field of processing application of two-dimensional images, such as visual tasks of image classification, object detection, semantic segmentation, and the like. However, the image is simply two-dimensional information, and the information which can be transmitted is limited, so how to acquire more three-dimensional information from the two-dimensional image gradually becomes one of hot problems of research.
The multi-view three-dimensional reconstruction technology is a method for recovering a three-dimensional scene based on a plurality of images with different view angles of the same scene. Conventional multi-view three-dimensional reconstruction techniques can be divided into four types of methods, voxel-based, surface evolution-based, feature region expansion-based, and depth map-based. The depth map-based method is simple and flexible, but the conventional depth map method is difficult to solve the problem of reconstruction of a weak texture region or a specular reflection region, and has low integrity and slow speed of reconstruction. The reconstruction method based on the deep learning can better utilize global semantic information, so that the accuracy and the speed of reconstruction are obviously improved.
In a multi-view reconstruction method based on deep learning, a three-dimensional cost body is generally constructed to perform depth regression to obtain a scene depth value. However, due to three-dimensional convolution regularization processing in depth regression calculation, the problem of limited video memory resources exists when a large-scale scene or a high-resolution scene is processed; and because the outdoor large-scale scene model rebuilding environment influence factors are complex, scene characteristics are difficult to comprehensively extract, the parameter self-adaptability is poor, modeling can be performed only for specific scenes, and the model generalization capability is not strong.
In view of the foregoing, there is a great need for solving the above problems.
Disclosure of Invention
The technical scheme aims at the problems and the demands, and provides a multi-view three-dimensional reconstruction method based on an attention mechanism and a variable convolution depth network, which can solve the technical problems due to the adoption of the following technical scheme.
In order to achieve the above purpose, the present invention provides the following technical solutions: a multi-view three-dimensional reconstruction method based on an attention mechanism and a variable convolution depth network comprises the following steps:
step1: constructing a multi-scale feature extraction network based on a recursive feature pyramid structure, and extracting multi-scale semantic features of the acquired multi-view images to acquire complete and accurate target key features;
step2: inputting the features obtained in Step1 into a block matching iterative model based on an attention mechanism and a deformation convolution depth network to perform pixel-level depth evaluation iterative calculation, and outputting a depth map corresponding to each image after the iterative calculation is completed;
step3: taking the depth map obtained in the Step2 and the corresponding image as input, extracting depth residual errors through a residual error network, and optimizing the depth map to obtain a final depth map;
step4: and taking all depth maps obtained by iterative calculation as input, filtering depth points through geometric consistency constraint and illumination consistency constraint, carrying out depth map fusion according to a visibility fusion algorithm, and outputting to obtain a three-dimensional point cloud of the scene.
The recursive feature pyramid mechanism is characterized in that a cavity space convolution pooling pyramid network is applied to a feature image output by an original feature pyramid structure to extract target key features, and the obtained target features are input into the feature pyramid network again to purify the target features again, so that a feature image with more abundant information is obtained.
In the iteration process of the block matching model based on the attention mechanism and the deformation convolution depth network, if the iteration is the primary iteration, taking a target feature map as initial iteration input; if the iteration is started, the target feature map is connected with the depth map output by the previous iteration to be used as the input of the current iteration. When depth evaluation is specifically performed, the target feature is enhanced through an attention mechanism, and then the cost of each depth hypothesis value of each pixel is calculated through the group-wise correlation, and the specific steps are as follows:
firstly, strengthening key features of a reference image based on an attention mechanism, and focusing on key targets in the image in iterative calculation; then converting the pixel points of the source view into a reference view coordinate system based on homography transformation, and carrying out dot product operation on the reference view and the source view under the reference view coordinate system to obtain image similarity; grouping the feature channel dimensions, and projecting each grouping similarity into a cost value of each depth hypothesis of each pixel of the reference image through a three-dimensional convolution network; then, carrying out summation normalization calculation on the depth cost values of all the source views, obtaining the probability of each depth hypothesis of each pixel point, and carrying out depth regression calculation to output a depth map; and finally, sampling the depth value of the similar pixel point of each pixel point through a variability convolution depth network, and performing depth filtering to smooth and filter the wrong depth estimation value.
The optimized residual error network takes the target feature image and the obtained depth image as input, extracts depth residual error information and optimizes the depth image.
The depth map fusion is used for projecting depth points meeting geometric constraint requirements and illumination consistency requirements into a three-dimensional space to form a three-dimensional point cloud so as to obtain a reconstructed three-dimensional model.
From the technical scheme, the beneficial effects of the invention are as follows: the method comprises the steps of obtaining target key features with richer semantics through a recursive feature pyramid structure, fusing global semantic information in depth evaluation calculation through an attention mechanism to make up for the deficiency of a convolution network, and filtering a depth map according to a variability convolution network, so that the obtained depth map is more complete and accurate, the reconstruction precision of a three-dimensional model is higher, and the universality of the whole network model is also stronger.
Drawings
FIG. 1 is a schematic diagram showing the steps in an embodiment of the present invention;
FIG. 2 is a diagram of a network architecture in an embodiment of the present invention;
FIG. 3 is a diagram of a recursive feature pyramid network architecture in an embodiment of the present invention;
FIG. 4 is a diagram of a network structure of a hole space convolution pooling pyramid in an embodiment of the present invention;
FIG. 5 is a diagram of a network architecture of an attention mechanism in an embodiment of the present invention;
fig. 6 is a three-dimensional reconstruction effect diagram in an embodiment of the present invention.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings, the examples being described only for the purpose of facilitating the understanding of the invention and not being limiting.
The existing multi-view three-dimensional reconstruction method has the problems of poor reconstruction effect in a weak texture area or a reflecting surface, long running time, high memory consumption, weak generalization capability and the like, so the invention discloses a processing method based on an attention mechanism and a variable depth convolution network. As shown in fig. 1 and 2, the present invention provides a multi-view three-dimensional reconstruction method based on an attention mechanism and a variable depth convolution network, and the specific steps of the present invention include:
step1, constructing a multi-scale feature extraction network based on a recursive feature pyramid structure, and extracting multi-scale semantic features of the acquired multi-view images to acquire complete and accurate target key features.
Specifically, as shown in fig. 3, based on a recursive feature pyramid structure, a convolution neural network is used to perform multi-scale feature extraction on a plurality of multi-view images to obtain an original feature map. And applying an up-sampling layer to the original feature map with low resolution, and fusing the up-sampling layer with the original feature map with the next scale to obtain the target feature with rich semantic information. And the up-sampling fusion processing is adopted for the original feature images between two adjacent scales to obtain feature images with various scales.
Furthermore, purifying the obtained multi-scale feature map through a cavity space convolution pooling pyramid network, and inputting the purified feature map into the feature extraction network again to re-extract target key features, so as to obtain the multi-scale feature map with more abundant semantic information.
In this embodiment, the process of training the multi-scale extraction network through the training set includes:
inputting N H W multi-view images, including a reference view I ref And N-1 source views { I } src Acquiring three-scale original feature images through a trunk feature extraction network, and respectively marking the three-scale original feature images
Figure BDA0004103716460000041
Three-scale original feature map is F 1 ,F 2 ,F 3 ,F 1 Obtaining ∈ through an upsampling network>
Figure BDA0004103716460000042
New feature map F of scale 1 ', F 1 ' and F 2 Fusion to give->
Figure BDA0004103716460000043
Final feature map F of scale 2 '. Similarly, F is 2 ' through the upsampling network and with F 3 Fusion, obtain->
Figure BDA0004103716460000044
Scale target key feature F 3 . As shown in FIG. 4, the pyramid network of the cavity space convolution pool can refine and purify the input feature map, and F is the above 1 、F 2 、F 3 Three feature images are input into a feature image F with more abundant obtained information 1 、F 2 、F 3 . Finally, F 1 、F 2 、F 3 Inputting the features into the trunk feature extraction network again for feature extraction again,acquiring more refined feature map F 1 ”’ 、F 2 ”’ 、F 3 ”’ And F is combined with 1 ”’ And F is equal to 1 、F 2 ”’ And F is equal to 2 、F 3 ”’ And F is equal to 3 And respectively fusing to obtain the output multi-scale characteristics. Compared with the traditional convolutional network for extracting the features, the method has the advantages that the repeated extraction and fusion of the target features are carried out through the recursive feature pyramid network, the feature map with richer semantic information and more accurate target features can be obtained, the output multi-scale features can be used for training a subsequent model at different scales, and the robustness and generalization capability of the whole model network are improved.
Step2, performing pixel-level depth evaluation iterative computation in a block matching iterative model based on an attention mechanism and a deformation convolution depth network, and outputting a depth map corresponding to each image after the iterative computation is completed.
For multi-view stereo matching, the correct depth value of any point pixel of the reference image is also the correct depth value of the corresponding pixel of any source view. For multi-view depth estimation, visual information of multiple source views needs to be integrated to estimate each depth value of a single pixel on a reference view. For this reason, as shown in fig. 5, global semantic information is integrated based on an attention mechanism, and the relationship in the spatial domain of each pixel point is fully utilized to strengthen the key features of the target; and then, carrying out weighted calculation based on grouping operation to obtain the cost value of each depth hypothesis of each pixel point, carrying out cost body normalization through a three-dimensional convolution network to obtain the single-generation value of each depth hypothesis, and carrying out depth regression calculation to obtain a depth map. Finally, based on a variability convolution network, variability filtering is performed, compared with a fixed filtering window, the variability filtering can obtain pixel point coordinates of the same three-dimensional plane, unreasonable depth values in the depth map can be effectively smoothed through averaging processing, and a more accurate depth map is obtained.
Specifically, in the iterative computation process, if the iteration is the first iteration, taking the target feature map as initial iteration input; if the iteration is started, the target feature map is connected with the depth map output by the previous iteration to be used as the input of the current iteration.
In the present embodiment, the depth range [1/d ] is according to a predefined reciprocal max ,1/d min ]The uniform sampling is performed such that each pixel has a plurality of depth hypotheses, and a random number is added to each depth hypothesis to obtain a random depth hypothesis. And acquiring coordinates (u, v) of similar pixel points of each pixel point based on the variability convolutional neural network, and performing depth sampling according to the acquired pixel coordinates to acquire new candidate hypotheses. By doing so, the network model can process large-scale complex scenes, and has better model generalization capability. For the subsequent iterative computation, according to the depth map generated by the previous iterative computation, adding random numbers in a reasonable depth range R to perform random disturbance, and gradually shrinking the depth range R in the subsequent iterative computation. This may further refine the depth estimation results and correct for erroneous estimations in previous iterative calculations.
Specific processes for depth assessment based on the attention mechanism and the variable convolution depth network include: acquisition of reference view I by means of an attention mechanism i Weight information w of upper pixel p i (p); according to the formula:
Figure BDA0004103716460000051
computing acquisition of reference view I i With source view I j Group similarity S at pixel p i (p, j); s is carried out through a three-dimensional convolution network i (p, j) projection as reference view I i The cost value of each depth hypothesis of each pixel of (1) is calculated by performing depth regression to obtain a reference view I i Is a depth map of (2); and finally, acquiring pixel point coordinates of the same local plane in the three-dimensional space through a variability convolution network, performing depth sampling according to the pixel coordinates, and acquiring depth average values of all sampling points to obtain a smoothed depth value. The variability filter formula is shown below: />
Figure BDA0004103716460000052
Wherein F is 0 (p) g And F i (p i,j ) g G-th group of features representing a source view and g-th group of features of a reference view, p i,j Representing pixel points p corresponding to the reference view and the source view, N represents the total number of the reference view and the source view, D i Representing reference view I i Depth hypothesis, d p Representing the depth value, d, of the pixel p t Represents the depth value of the pixel point in the same plane as the pixel p, and n represents the number of sampling points.
Step3, taking the depth map obtained in Step2 and the corresponding image as input, extracting a depth residual error through a residual error network, and optimizing the depth map to obtain a final depth map.
Specifically, to avoid that a certain depth scale is shifted, the input depth map is normalized to [0,1].
Further, up-sampling the reference view feature map extracted by the multi-scale feature network through a deconvolution network, up-sampling the normalized depth map to H×W scale, fusing the two obtained features, extracting depth residual information through a residual network, and finally fusing the obtained depth residual information with the depth map obtained in Step2 to obtain the final optimized depth map.
In the experiment, the depth map output by each stage has different degrees of errors with the true value, and in order to better reflect the loss change of the whole training process, the losses of different stages are accumulated to be the final loss, so that the loss function is defined as:
Figure BDA0004103716460000061
in this embodiment, the target key feature and the depth map obtained in Step2 are used as inputs, the feature map and the depth map normalized to [0,1] are respectively up-sampled and combined together, depth residual information is extracted through a residual network, the depth residual information and the normalized depth map are fused, and then the fused result is restored to be within the original depth range, so as to obtain the optimized depth map.
Step4, taking all depth maps obtained through iterative computation as input, filtering depth points through geometric consistency constraint and illumination consistency constraint, carrying out depth map fusion according to a visibility fusion algorithm, and outputting to obtain a three-dimensional point cloud of a scene.
Specifically, the probability of obtaining the depth map and the corresponding depth hypothesis in Step3 is used as input, depth points are filtered according to illumination consistency and geometric constraint consistency, and depth points which do not meet the threshold condition are removed. And based on a fusion algorithm of visibility, influence factors such as shielding, illumination and the like are reduced to the minimum, so that depth shielding and conflict among different views are minimized, unified point cloud representation is obtained, and a three-dimensional reconstruction result is output.
In this embodiment, the probability of the depth map and the corresponding depth hypothesis obtained by previous calculation is input, the illumination consistency threshold α and the geometric constraint consistency threshold β are preset, and the depth points which do not reach the threshold α or the threshold β are removed. In order to further suppress noise points, the depth values of similar pixel points meeting the threshold requirement are averaged to obtain the final visible depth. Finally, as shown in fig. 6, the fused depth map is directly projected into a three-dimensional space, and a three-dimensional reconstructed point cloud model is output.
The multi-view three-dimensional reconstruction algorithm based on the attention mechanism and the variable convolution depth network is adopted, the attention mechanism is introduced to strengthen the key characteristics of the target based on the recursive characteristic pyramid structure, global semantic information is better utilized in the depth evaluation process, and the depth evaluation precision is improved; based on regional locality assumption, the depth map is optimized through a variable convolution depth network, so that the problem of non-ideal reconstruction effect of a weak texture region in the existing three-dimensional reconstruction algorithm is solved well, and the three-dimensional reconstruction requirement of a complex large scene can be met.
The above results illustrate the feasibility and effectiveness of the three-dimensional reconstruction algorithm based on the attention mechanism and the depth-variable convolutional network proposed in the present application.
It should be noted that the above detailed description of the implementation of the present invention with reference to the drawings is only a preferred mode of the present invention, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention, and only obvious modifications are to be considered as falling within the scope of the present invention.

Claims (5)

1. A multi-view three-dimensional reconstruction method based on an attention mechanism and a variable convolution depth network is characterized by comprising the following steps:
step1: constructing a multi-scale feature extraction network based on a recursive feature pyramid structure, and extracting multi-scale semantic features of the acquired multi-view images to acquire complete and accurate target key features;
step2: performing pixel-level depth evaluation iterative computation in a block matching iterative model based on an attention mechanism and a deformation convolution depth network, and outputting a depth map corresponding to each image after the iterative computation is completed;
step3: based on a residual depth optimization module, taking the depth map and a corresponding image as input, extracting a depth residual through a residual network, and optimizing the depth map to obtain a final depth map;
step4: and based on a depth map fusion module, taking all depth maps obtained by iterative calculation as input, filtering depth points through geometric consistency constraint and illumination consistency constraint, carrying out depth map fusion according to a visibility fusion algorithm, and outputting to obtain a three-dimensional point cloud of a scene.
2. The multi-view three-dimensional reconstruction method based on the attention mechanism and the variable convolution depth network as claimed in claim 1, wherein the recursive feature pyramid structure uses a convolution neural network to extract multi-scale features of a plurality of multi-view images to obtain an original feature map;
applying an up-sampling layer to the original feature map with low resolution, and fusing the up-sampling layer with the original feature map with the next scale to obtain target features with rich semantic information;
the up-sampling fusion processing is adopted for the original feature images between two adjacent scales to obtain feature images with various scales;
purifying the obtained multi-scale feature map through a cavity space convolution pooling pyramid network, and inputting the purified feature map into the feature extraction network again to re-extract target key features, so as to obtain the multi-scale feature map with more abundant semantic information.
3. The multi-view three-dimensional reconstruction method based on an attention mechanism and a variable convolution depth network according to claim 1, wherein the block matching iterative model based on the attention mechanism and the variable convolution depth network strengthens key features of a reference image based on the attention mechanism, and focuses on key targets in the image in iterative calculation;
then converting the pixel points of the source view into a reference view coordinate system based on homography transformation, and carrying out dot product operation on the reference view and the source view under the reference view coordinate system to obtain image similarity;
grouping the feature channel dimensions, and projecting each grouping similarity into a cost value of each depth hypothesis of each pixel of the reference image through a three-dimensional convolution network;
then, carrying out summation normalization calculation on the depth cost values of all the source views, obtaining the probability of each depth hypothesis of each pixel point, and carrying out depth regression calculation to output a depth map;
finally, sampling depth values of similar pixel points of each pixel point through a variability convolution depth network, and performing depth filtering to smooth and filter wrong depth estimation values;
in the iterative computation process, if the iteration is the primary iteration, taking the target feature map as initial iteration input;
if the iteration is started, the target feature map is connected with the depth map output by the previous iteration to be used as the input of the current iteration.
4. The multi-view three-dimensional reconstruction method based on an attention mechanism and a variable convolution depth network according to claim 1, wherein the residual depth optimization module normalizes the input depth map to [0,1] in order to avoid that a certain depth scale is shifted;
upsampling a reference view feature map extracted by a multi-scale feature network through a deconvolution network, upsampling a normalized depth map, fusing the two obtained features, and extracting depth residual information through a residual network;
and finally, fusing the obtained depth residual information with the depth map after normalization and up-sampling to obtain a final optimized depth map.
5. The multi-view three-dimensional reconstruction method based on the attention mechanism and the variable convolution depth network as set forth in claim 1, wherein the depth fusion module filters depth points according to illumination consistency and geometric constraint consistency to eliminate depth points which do not meet a threshold condition;
and based on a fusion algorithm of visibility, influence factors such as shielding, illumination and the like are reduced to the minimum, so that depth shielding and conflict among different views are minimized, unified point cloud representation is obtained, and a three-dimensional reconstruction result is output.
CN202310185694.XA 2023-03-01 2023-03-01 Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network Pending CN116310098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310185694.XA CN116310098A (en) 2023-03-01 2023-03-01 Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310185694.XA CN116310098A (en) 2023-03-01 2023-03-01 Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network

Publications (1)

Publication Number Publication Date
CN116310098A true CN116310098A (en) 2023-06-23

Family

ID=86816186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310185694.XA Pending CN116310098A (en) 2023-03-01 2023-03-01 Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network

Country Status (1)

Country Link
CN (1) CN116310098A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117011466A (en) * 2023-08-08 2023-11-07 苏州三垣航天科技有限公司 Three-dimensional reconstruction method based on piecewise planar algorithm
CN117274351A (en) * 2023-11-02 2023-12-22 华东师范大学 Semantic-containing three-dimensional reconstruction method for multi-scale feature pyramid
CN117437363A (en) * 2023-12-20 2024-01-23 安徽大学 Large-scale multi-view stereoscopic method based on depth perception iterator

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117011466A (en) * 2023-08-08 2023-11-07 苏州三垣航天科技有限公司 Three-dimensional reconstruction method based on piecewise planar algorithm
CN117011466B (en) * 2023-08-08 2024-03-29 苏州三垣航天科技有限公司 Three-dimensional reconstruction method based on piecewise planar algorithm
CN117274351A (en) * 2023-11-02 2023-12-22 华东师范大学 Semantic-containing three-dimensional reconstruction method for multi-scale feature pyramid
CN117437363A (en) * 2023-12-20 2024-01-23 安徽大学 Large-scale multi-view stereoscopic method based on depth perception iterator
CN117437363B (en) * 2023-12-20 2024-03-22 安徽大学 Large-scale multi-view stereoscopic method based on depth perception iterator

Similar Documents

Publication Publication Date Title
WO2024077812A1 (en) Single building three-dimensional reconstruction method based on point cloud semantic segmentation and structure fitting
Zhang et al. Deep hierarchical guidance and regularization learning for end-to-end depth estimation
Hu et al. Deep depth completion from extremely sparse data: A survey
CN116310098A (en) Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network
CN113345082B (en) Characteristic pyramid multi-view three-dimensional reconstruction method and system
Li et al. Confidence-based large-scale dense multi-view stereo
CN103136525B (en) High-precision positioning method for special-shaped extended target by utilizing generalized Hough transformation
Li et al. Coarse-to-fine PatchMatch for dense correspondence
CN110909778B (en) Image semantic feature matching method based on geometric consistency
CN111860651B (en) Monocular vision-based semi-dense map construction method for mobile robot
Xue et al. Boundary-induced and scene-aggregated network for monocular depth prediction
CN116310095A (en) Multi-view three-dimensional reconstruction method based on deep learning
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN114937073A (en) Image processing method of multi-view three-dimensional reconstruction network model MA-MVSNet based on multi-resolution adaptivity
CN115018999A (en) Multi-robot-cooperation dense point cloud map construction method and device
CN113421210A (en) Surface point cloud reconstruction method based on binocular stereo vision
Lin et al. A-SATMVSNet: An attention-aware multi-view stereo matching network based on satellite imagery
Lin et al. High-resolution multi-view stereo with dynamic depth edge flow
CN116704123A (en) Three-dimensional reconstruction method combined with image main body extraction technology
CN116229074A (en) Progressive boundary region optimized medical image small sample segmentation method
Chen et al. Monocular image depth prediction without depth sensors: An unsupervised learning method
Cheng et al. Two-branch convolutional sparse representation for stereo matching
CN114964206A (en) Monocular vision odometer target pose detection method
Lyra et al. Development of an efficient 3D reconstruction solution from permissive open-source code
Cheng et al. Dual attention-guided self-adaptive aware cascade network for multi-view stereo and stereo matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination