CN114519772A - Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation - Google Patents

Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation Download PDF

Info

Publication number
CN114519772A
CN114519772A CN202210090256.0A CN202210090256A CN114519772A CN 114519772 A CN114519772 A CN 114519772A CN 202210090256 A CN202210090256 A CN 202210090256A CN 114519772 A CN114519772 A CN 114519772A
Authority
CN
China
Prior art keywords
cost
sparse point
depth
point cloud
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210090256.0A
Other languages
Chinese (zh)
Inventor
陶文兵
齐雨航
刘李漫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Tuke Intelligent Technology Co ltd
Original Assignee
Wuhan Tuke Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tuke Intelligent Technology Co ltd filed Critical Wuhan Tuke Intelligent Technology Co ltd
Priority to CN202210090256.0A priority Critical patent/CN114519772A/en
Publication of CN114519772A publication Critical patent/CN114519772A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/529Depth or shape recovery from texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a three-dimensional reconstruction method and a system based on sparse point cloud and cost aggregation, wherein the method comprises the following steps: acquiring a multi-view image and a plurality of corresponding sparse point clouds, and preprocessing the sparse point clouds to obtain depth maps under a plurality of views; extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies; and restoring a depth map from each probability body, and fusing the depth map with the filtered depth maps under multiple viewing angles to obtain a reconstructed point cloud model. According to the method, the constructed cost body is modulated by using sparse prior through a strategy based on sparse point guide, and the accuracy of the cost body in estimating the depth of a weak texture region and a detailed structure is improved by using means such as regularization, so that the reconstruction quality of the three-dimensional point cloud model is improved.

Description

Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a three-dimensional reconstruction method and a three-dimensional reconstruction system based on sparse point cloud and cost aggregation.
Background
Image-based three-dimensional reconstruction aimed at recovering three-dimensional geometry from multiple input images is an important and challenging problem in the field of computer vision. Compared with active three-dimensional reconstruction based on laser radar, the three-dimensional reconstruction based on the image has the advantages of low cost, strong universality and the like.
The traditional multi-view three-dimensional reconstruction method carries out cross-view similarity search based on manually designed features, can obtain a better reconstruction effect in an ideal Lambert body scene, but in a weak texture region and a region with mirror reflection, the reconstruction effect is not satisfactory due to difficult extraction of image features. In recent years, deep neural networks have found widespread use in the field of computer vision. The deep learning method automatically learns the characteristics of an input image through a deep neural network based on a large amount of labeled data. Compared with the traditional method, the features extracted by the deep neural network contain more semantic information.
The scholars Xu and Tao of the science and technology university in 2020 replace the cost index based on the variance with Average Group Correlation (Average Group-wise Correlation), and on the premise of not reducing the reconstruction quality of the model, the video memory cost of the GPU is reduced. Meanwhile, the multi-view depth estimation problem is modeled into an inverse depth regression problem, so that the model is better represented in a scene with a large depth range.
Although the method based on the deep learning makes great progress, the result of sparse reconstruction is not fully utilized, only the pose information of the camera is utilized, and the sparse point cloud information is ignored or not fully utilized.
Disclosure of Invention
In order to fully utilize sparse point cloud information and improve the accuracy of depth estimation, and further improve the reconstruction quality of a three-dimensional point cloud model, in particular to solve the problem that image features are difficult to extract in weak texture areas and areas with specular reflection, the invention provides a three-dimensional reconstruction method based on sparse point cloud and cost aggregation in a first aspect, which comprises the following steps: acquiring a multi-view image and a plurality of corresponding sparse point clouds, and preprocessing the sparse point clouds to obtain depth maps under a plurality of views; extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies; and restoring a depth map from each probability body, and fusing the depth map with the filtered depth maps under multiple viewing angles to obtain a reconstructed point cloud model.
In some embodiments of the present invention, the extracting features from the multi-view image and constructing one or more cost bodies includes: respectively extracting the features of each image of the multi-view images by using a convolutional neural network to obtain a plurality of feature maps; selecting one of the characteristic diagrams as a reference characteristic diagram, using the rest characteristic diagrams as source characteristic diagrams, and calculating a characteristic body of each source characteristic diagram on the reference characteristic diagram to obtain characteristic bodies of a plurality of views; and aggregating the characteristic bodies of the plurality of views into a cost body.
Further, the aggregating the feature volumes of the multiple views as a cost volume is realized by the following method:
Figure BDA0003488469720000021
where C represents a cost body, M represents a variance calculation element by element, viRepresenting the ith feature, N representing the total number of features,
Figure BDA0003488469720000022
mean values for all the features are indicated.
In some embodiments of the present invention, the modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies includes: constructing a Gaussian modulation function based on the depth maps under multiple viewing angles; modulating each cost body according to the Gaussian modulation function; and regularizing each cost body by using a 3D segmentation network to obtain a filtered probability body.
Further, the regularization is achieved by:
Figure BDA0003488469720000023
wherein C (v)0) For each voxel v in the cost volume0The cost of (a) of (b),
Figure BDA0003488469720000024
operating on the post-regularization voxel v0The cost of (d); omegakIs the weight at the kth sample position, vkFor a fixed offset in the convolution field, Δ vkThe bias learned in the process of adaptive cost aggregation.
In the above embodiment, the preprocessing the plurality of sparse point clouds to obtain depth maps under a plurality of viewing angles includes: acquiring three-dimensional points corresponding to all key points under each view, and filtering invisible three-dimensional points; and the depth value of each three-dimensional point in the image coordinate system is obtained by projection change and coordinate transformation of the filtered three-dimensional points according to the camera external parameters of the current view.
In a second aspect of the present invention, a three-dimensional reconstruction system based on sparse point cloud and cost aggregation is provided, including: the system comprises an acquisition module, a depth map generation module and a depth estimation module, wherein the acquisition module is used for acquiring a multi-view image and a plurality of corresponding sparse point clouds and preprocessing the sparse point clouds to obtain depth maps under a plurality of views; the construction module is used for extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies; and the reconstruction module is used for recovering the depth map from each probability body and fusing the depth map with the filtered depth maps under the multiple viewing angles to obtain a reconstructed point cloud model.
Further, the building module comprises: the extraction unit is used for respectively extracting the features of each image of the multi-view images by utilizing a convolutional neural network to obtain a plurality of feature maps; the calculation unit is used for selecting one of the feature maps as a reference feature map, using the rest feature maps as source feature maps, and calculating feature bodies of each source feature map on the reference feature map to obtain feature bodies of a plurality of views; and the aggregation unit is used for aggregating the characteristic bodies of the multiple views into a cost body.
In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; a storage device, configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the sparse point cloud and cost aggregation based three-dimensional reconstruction method provided by the present invention in the first aspect.
In a fourth aspect of the present invention, a computer-readable medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the sparse point cloud and cost aggregation based three-dimensional reconstruction method provided in the first aspect of the present invention.
The invention has the beneficial effects that:
1. the method makes full use of the result of sparse reconstruction, and takes the sparse point cloud obtained by sparse reconstruction as prior information to be blended into a cost body. The method has the advantages that a sparse depth map obtained by sparse point cloud projection is used as the geometric prior of a scene, and the accuracy of depth estimation is improved in a mode of enhancing the depth hypothesis near the sparse prior and restraining the depth hypothesis far away from the prior depth value, especially at the position of a fine structure and depth discontinuity;
2. an adaptive cost aggregation module is introduced in a cost body regularization stage, so that a network model has the capability of scene perception, the bias is adaptively learned in a data-driven mode, and more accurate depth estimation is obtained at a weak texture region and an object boundary.
Drawings
Fig. 1 is a basic flow diagram of a three-dimensional reconstruction method based on sparse point cloud and cost aggregation in some embodiments of the present invention;
FIG. 2 is a detailed flow chart of a method for minimizing the road network data range of an electronic horizon in some embodiments of the invention;
FIG. 3 is a schematic diagram of a convolutional neural network for image feature extraction in some embodiments of the present invention;
FIG. 4 is a schematic diagram of sparse point cloud data preprocessing in some embodiments of the present invention;
FIG. 5 is a schematic diagram of the operating principle of a Gaussian modulation function in some embodiments of the invention;
FIG. 6 is a schematic diagram of a 3D UNET network structure for adaptive cost aggregation in some embodiments of the present invention;
FIG. 7 is a schematic structural diagram of a three-dimensional reconstruction system based on sparse point cloud and cost aggregation in some embodiments of the present invention;
fig. 8 is a schematic structural diagram of an electronic device in some embodiments of the invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1 and fig. 2, in a first aspect of the present invention, a three-dimensional reconstruction method based on sparse point cloud and cost aggregation is provided, including: s100, acquiring a multi-view image and a plurality of corresponding sparse point clouds, and preprocessing the sparse point clouds to obtain depth maps under a plurality of views; s200, extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies; s300, restoring a depth map from each probability body, and fusing the depth map with the filtered depth maps under multiple viewing angles to obtain a reconstructed point cloud model.
It can be understood that the deep learning method makes great progress, but does not fully utilize the result of sparse reconstruction, only utilizes the pose information of the camera, ignores or does not fully utilize the sparse point cloud information. The method overcomes the problem, and makes full use of sparse point cloud information, thereby improving the accuracy of the three-dimensional reconstruction model. A multi-view image refers to multiple images taken from different shooting angles (views) of the same scene, i.e., the multiple images originate from the same scene or image.
In step S200 of some embodiments of the present invention, the extracting features of the multi-view image and constructing one or more cost bodies includes: s201, utilizing a convolutional neural network to respectively extract the features of each image of the multi-view images to obtain a plurality of feature maps; s202, selecting one of the feature maps as a reference feature map, using the rest feature maps as source feature maps, and calculating feature bodies of each source feature map on the reference feature map to obtain feature bodies of multiple views; s203, aggregating the characteristic bodies of the multiple views into a cost body.
Specifically, in step S201, the convolutional neural network shown in fig. 3 is used as an input image
Figure BDA0003488469720000051
Performing feature extraction to obtain feature maps corresponding to the N images
Figure BDA0003488469720000052
The convolutional neural network contains 8 convolutional layers, except the last convolutional layer, each convolutional layer is followed by a BN layer and a ReLU activation function. The image feature extraction module realizes 3 multiplied by H multiplied by W to
Figure BDA0003488469720000053
Where H and W are the height and width of the input image, respectively, and C is the channel dimension of the feature map. And the extracted feature graph is used for subsequent cost body construction.
Next, in step S202, F1Features extracted for a reference image for which depth estimation is required,
Figure BDA0003488469720000054
for the feature map of the source image to be matched with the reference image,
Figure BDA0003488469720000055
and obtaining a camera internal reference matrix, a rotation matrix and a translation vector corresponding to each view through sparse reconstruction. By reference image features F1For reference, pixel p on the image feature is referenced, based on depth hypothesis djAt source image feature FiCorresponding pixel point p oni,jThe calculation formula of (2) is as follows:
Figure BDA0003488469720000061
in the formula djFor the jth depth hypothesis, where j ∈ {1,2, …, Nd},NdFor the number of depth hypotheses, the subscript ref denotes a reference image feature. The characteristic diagram of each image is transformed by projection to obtain corresponding characteristic body
Figure BDA0003488469720000062
In order to flexibly process any number of input views, the invention utilizes the measurement index based on variance to combine the feature bodies of multiple views
Figure BDA0003488469720000063
The polymerization is at the cost of body C.
Figure BDA0003488469720000064
In the formula
Figure BDA0003488469720000065
Mean of all the feature volumes, M is the element-by-element variance calculation, viDenotes the ith feature, and N denotes the total number of features.
In step S200 of some embodiments of the present invention, the modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies includes: s204, constructing a Gaussian modulation function based on the depth maps under multiple visual angles; s205, modulating each cost body according to the Gaussian modulation function; s206, regularizing each cost body by using a 3D segmentation network to obtain a filtered probability body.
Referring to fig. 5, schematically, step S204 includes: and (3) constructing a Gaussian modulation function by taking the sparse depth value d' as a center and the depth hypothesis d as an independent variable. The gaussian modulation function is used for feature enhancement of the cost body. Specifically, the response of the depth hypothesis near d 'is enhanced, and the depth hypothesis far from d' is suppressed. Gaussian modulation function:
Figure BDA0003488469720000066
wherein d is a depth assumption value; d' is a sparse depth value; k is the amplitude of the gaussian function and c is the bandwidth. Considering an index using variance, a smaller cost means a higher confidence in the depth hypothesis, so the above equation is rewritten as:
Figure BDA0003488469720000067
in the formula of omegasparseIs a collection of pixels where there is a priori depth.
Referring to fig. 6, schematically, step S205 includes: and utilizing a 3 DU-Net network to carry out regularization operation on the cost body to obtain a filtered probability body. According to the method, the self-adaptive cost aggregation module is introduced in the cost body regularization step, and the self-adaptive cost aggregation module can learn offset in a self-adaptive manner, so that more accurate reconstruction precision is obtained at the discontinuous depth position. Optionally, regularization of the cost body is achieved by using a variant of the U-Net series or a 3D image segmentation network such as Segnet.
Specifically, in step S206, the probability volume normalization is performed, and the cost volume of the C channel is converted into the probability volume of the single channel by using 3D convolution, so as to realize the normalization
Figure BDA0003488469720000071
To
Figure BDA0003488469720000072
The soft-max normalization operation is carried out on the probability body along the depth direction.
In step S206, the regularization is implemented by:
Figure BDA0003488469720000073
in the formula C (v)0) For each voxel v in the cost volume0The cost of (a) of (b),
Figure BDA0003488469720000074
operating on the post-regularization voxel v0The cost of (d); omegakIs the weight at the kth sample position, vkFor a fixed offset in the convolution field, Δ vkThe bias learned in the process of adaptive cost aggregation.
It is understood that Voxelization (Voxelization) is the conversion of a geometric representation of an object into a voxel representation closest to the object, resulting in a volume data set that contains not only surface information of the model, but also internal properties of the model. The voxels correspond to pixels in the image and can be understood as pixels in a three-dimensional object, which allows the object in three-dimensional space to be represented based on regular spatial volume pixels.
The voxel representation method typically includes SDF and TSDF: sdf (designed distance field) is the valid distance field. I.e. the object surface is simulated by assigning an SDF to each voxel. If the SDF value is greater than 0, it indicates that the voxel is in front of the current surface, and if the SDF value is less than 0, it indicates that the voxel is behind the surface, the closer the SDF is to 0, indicating that it is closer to the real surface of the scene. However, this representation method occupies a large amount of resources, and thus TSDF has been proposed;
TSDF is proposed to reduce the resource consumption of the voxel representation method. The TSDF uses a grid cube to represent a three-dimensional space, the distance between each grid and the surface of an object is stored in each grid, the positive and negative values of the TSDF respectively represent that the TSDF is shielded and visible by the surface, and points on the surface pass through a zero point.
In step S300 of some embodiments of the present invention, the recovering a depth map from each probability volume and fusing the depth map with the filtered depth maps at multiple viewing angles to obtain a reconstructed point cloud model includes the following steps:
s301, depth map regression: to achieve depth estimation accuracy at the sub-pixel level, the present invention uses a weighted average of all depth hypotheses as the final depth output (soft-argmin operation), with the weight of each term being the probability value for that hypothesis. The pixel-by-pixel depth estimate is calculated by:
Figure BDA0003488469720000081
wherein P (d) is the probability value corresponding to the depth hypothesis d;
s302, calculating a luminosity confidence map: the luminosity confidence map is used for measuring the multi-view luminosity consistency matching quality, and the luminosity confidence is obtained by summing the probability of 4 nearest assumed values of the depth assumed value. Photometric confidence maps are used for depth map filtering in the above embodiments;
s303, depth map filtering: depth map filtering uses photometric and geometric consistency for robust depth map filtering. The invention uses a probability map to measure the quality of depth estimation, and the probability value is lower than tau0The points of (2) are filtered out as outliers. Geometric consistency is used to measure depth continuity between multiple views. Reference image pixel point p1Depth value d of1P projected to neighborhood viewiAt point, then p isiDepth value d ofiReprojection onto a reference image preprojDepth value d ofreprojIf | p is satisfiedreproj-p1|< τ1And | dreproj-d1|/d12Is called pixel p1An estimated value d of depth of1The two views are consecutive. In the invention, at least n is satisfied to ensure the cross-view continuity of depth estimationτThe depth estimates for successive views are retained. Projecting the filtered pixel points to a three-dimensional space in a reverse direction to obtain a thick imageAnd (5) a dense three-dimensional point cloud model.
In step S100 of the foregoing embodiment, the preprocessing the plurality of sparse point clouds to obtain depth maps at a plurality of viewing angles includes: s101, obtaining three-dimensional points corresponding to all key points under each view, and filtering invisible three-dimensional points; and S102, obtaining the depth value of each three-dimensional point in the image coordinate system of the three-dimensional point through projection change and coordinate transformation according to the camera external parameters of the current view.
Schematically, step S101 may refer to fig. 4: acquiring three-dimensional points corresponding to all key points under each view, filtering invisible three-dimensional points, and marking the visible three-dimensional points of the view as Pworld. Converting three-dimensional points in the world coordinate system into a camera coordinate system according to camera external parameters of the current view to obtain points P in the camera coordinate systemcam
Pcam=R·Pworld+t,
Wherein R is a rotation matrix and t is a translation vector. Obtaining the projection point and the depth value of the three-dimensional point under the image coordinate system according to the projection relation
d(u,v,1)T=K·Pcam
In the formula (u, v,1)TThe coordinates of the image pixel points are the same coordinates, and d is the depth value of the pixel points. K is camera internal reference.
It should be understood that parameters in the processes of cost body construction, cost body modulation, depth map filtering and point cloud fusion are required to be set before the method is implemented. In practical application, the depth maps obtained by regression of the cost bodies with different assumed numbers of different depths have differences; the constraint effects of Gaussian modulation functions corresponding to different amplitudes and bandwidths are different, and three-dimensional point cloud models with different visual effects can be obtained by different depth map fusion parameters. The parameters of the invention are as follows: number of depth hypothesis planes Nd=192,k=10,c=2Δd,Δd=dj+1-dj,τ0=0.8,τ1=1,τ2=0.01,nτ=3。
Example 2
Referring to fig. 7, in a second aspect of the present invention, there is provided a three-dimensional reconstruction system 1 based on sparse point cloud and cost aggregation, including: the acquiring module 11 is configured to acquire a multi-view image and a plurality of corresponding sparse point clouds, and pre-process the plurality of sparse point clouds to obtain depth maps at a plurality of views; the construction module 12 is configured to perform feature extraction on the multi-view image, construct one or more cost bodies, and modulate and regularize each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies; and the reconstruction module 13 is configured to recover the depth map from each probability volume, and fuse the depth map with the filtered depth maps at multiple viewing angles to obtain a reconstructed point cloud model.
Further, the building module 12 includes: the extraction unit is used for respectively extracting the features of each image of the multi-view images by utilizing a convolutional neural network to obtain a plurality of feature maps; the calculation unit is used for selecting one of the feature maps as a reference feature map, using the rest feature maps as source feature maps, and calculating feature bodies of each source feature map on the reference feature map to obtain feature bodies of a plurality of views; and the aggregation unit is used for aggregating the characteristic bodies of the multiple views into a cost body.
Example 3
Referring to fig. 8, in a third aspect of the present invention, there is provided an electronic apparatus comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of the invention in the first aspect.
The electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following devices may be connected to the I/O interface 505 in general: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 8 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more computer programs which, when executed by the electronic device, cause the electronic device to:
computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A three-dimensional reconstruction method based on sparse point cloud and cost aggregation is characterized by comprising the following steps:
acquiring a multi-view image and a plurality of corresponding sparse point clouds, and preprocessing the sparse point clouds to obtain depth maps under a plurality of views;
extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies;
and restoring a depth map from each probability body, and fusing the depth map with the filtered depth maps under multiple viewing angles to obtain a reconstructed point cloud model.
2. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method of claim 1, wherein the feature extraction of the multi-view image and the construction of one or more cost bodies comprises:
respectively extracting the features of each image of the multi-view images by using a convolutional neural network to obtain a plurality of feature maps;
selecting one of the feature maps as a reference feature map, using the rest feature maps as source feature maps, and calculating feature bodies of each source feature map on the reference feature map to obtain feature bodies of multiple views;
and aggregating the characteristic bodies of the plurality of views into a cost body.
3. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method according to claim 2, wherein the aggregation of the feature volumes of the plurality of views into the cost volume is achieved by:
Figure FDA0003488469710000011
where C represents a cost body, M represents a variance calculation element by element, viRepresenting the ith feature, N representing the total number of features,
Figure FDA0003488469710000012
mean values for all the features are indicated.
4. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method of claim 1, wherein the modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies comprises:
constructing a Gaussian modulation function based on the depth maps under a plurality of visual angles;
modulating each cost body according to the Gaussian modulation function;
and regularizing each cost body by using a 3D segmentation network to obtain a filtered probability body.
5. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method of claim 4, wherein the regularization is achieved by:
Figure FDA0003488469710000021
wherein C (v)0) For each voxel v in the cost volume0The cost of (a) of (b),
Figure FDA0003488469710000022
operating on the post-regularization voxel v0The cost of (d); omegakIs the weight at the kth sample position, vkFor a fixed offset in the convolution field, Δ vkThe bias learned in the process of adaptive cost aggregation.
6. The sparse point cloud and cost aggregation-based three-dimensional reconstruction method according to any one of claims 1 to 5, wherein the preprocessing the plurality of sparse point clouds to obtain depth maps at a plurality of viewing angles comprises:
acquiring three-dimensional points corresponding to all key points under each view, and filtering invisible three-dimensional points;
and the depth value of each three-dimensional point in the image coordinate system is obtained by projection change and coordinate transformation of the filtered three-dimensional points according to the camera external parameters of the current view.
7. A three-dimensional reconstruction system based on sparse point cloud and cost aggregation comprises:
the system comprises an acquisition module, a depth map generation module and a depth estimation module, wherein the acquisition module is used for acquiring a multi-view image and a plurality of corresponding sparse point clouds and preprocessing the sparse point clouds to obtain depth maps under a plurality of views;
the construction module is used for extracting features of the multi-view image, constructing one or more cost bodies, and modulating and regularizing each cost body by using the plurality of sparse point clouds to obtain a plurality of probability bodies;
and the reconstruction module is used for recovering the depth map from each probability body and fusing the depth map with the filtered depth maps under the multiple viewing angles to obtain a reconstructed point cloud model.
8. The sparse point cloud and cost aggregation based three-dimensional reconstruction system of claim 7, wherein the construction module comprises:
the extraction unit is used for respectively extracting the features of each image of the multi-view images by using a convolutional neural network to obtain a plurality of feature maps;
the calculation unit is used for selecting one of the feature maps as a reference feature map, using the rest feature maps as source feature maps, and calculating feature bodies of each source feature map on the reference feature map to obtain feature bodies of a plurality of views;
and the aggregation unit is used for aggregating the characteristic bodies of the multiple views into a cost body.
9. An electronic device, comprising: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the sparse point cloud and cost aggregation based three-dimensional reconstruction method of any one of claims 1 to 6.
10. A computer readable medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the sparse point cloud and cost aggregation based three-dimensional reconstruction method of any one of claims 1 to 6.
CN202210090256.0A 2022-01-25 2022-01-25 Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation Pending CN114519772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210090256.0A CN114519772A (en) 2022-01-25 2022-01-25 Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210090256.0A CN114519772A (en) 2022-01-25 2022-01-25 Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation

Publications (1)

Publication Number Publication Date
CN114519772A true CN114519772A (en) 2022-05-20

Family

ID=81597577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210090256.0A Pending CN114519772A (en) 2022-01-25 2022-01-25 Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation

Country Status (1)

Country Link
CN (1) CN114519772A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114820755A (en) * 2022-06-24 2022-07-29 武汉图科智能科技有限公司 Depth map estimation method and system
CN115861401A (en) * 2023-02-27 2023-03-28 之江实验室 Binocular and point cloud fusion depth recovery method, device and medium
CN117671163A (en) * 2024-02-02 2024-03-08 苏州立创致恒电子科技有限公司 Multi-view three-dimensional reconstruction method and system
CN118365805A (en) * 2024-06-19 2024-07-19 淘宝(中国)软件有限公司 Three-dimensional scene reconstruction method and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114820755A (en) * 2022-06-24 2022-07-29 武汉图科智能科技有限公司 Depth map estimation method and system
CN115861401A (en) * 2023-02-27 2023-03-28 之江实验室 Binocular and point cloud fusion depth recovery method, device and medium
CN117671163A (en) * 2024-02-02 2024-03-08 苏州立创致恒电子科技有限公司 Multi-view three-dimensional reconstruction method and system
CN117671163B (en) * 2024-02-02 2024-04-26 苏州立创致恒电子科技有限公司 Multi-view three-dimensional reconstruction method and system
CN118365805A (en) * 2024-06-19 2024-07-19 淘宝(中国)软件有限公司 Three-dimensional scene reconstruction method and electronic equipment

Similar Documents

Publication Publication Date Title
CN106651938B (en) A kind of depth map Enhancement Method merging high-resolution colour picture
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
CN111325796B (en) Method and apparatus for determining pose of vision equipment
CN106910242B (en) Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera
CN105721853B (en) Generate method, system and the computer readable storage devices of image capture instruction
CN114519772A (en) Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation
WO2018127007A1 (en) Depth image acquisition method and system
WO2019011249A1 (en) Method, apparatus, and device for determining pose of object in image, and storage medium
US20110274343A1 (en) System and method for extraction of features from a 3-d point cloud
Chen et al. Transforming a 3-d lidar point cloud into a 2-d dense depth map through a parameter self-adaptive framework
Jaegle et al. Fast, robust, continuous monocular egomotion computation
CN113129352B (en) Sparse light field reconstruction method and device
CN114998406B (en) Self-supervision multi-view depth estimation method and device
CN116563493A (en) Model training method based on three-dimensional reconstruction, three-dimensional reconstruction method and device
JP2024507727A (en) Rendering a new image of a scene using a geometric shape recognition neural network conditioned on latent variables
CN115082540B (en) Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform
CN112907557A (en) Road detection method, road detection device, computing equipment and storage medium
CN114372523A (en) Binocular matching uncertainty estimation method based on evidence deep learning
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN110827341A (en) Picture depth estimation method and device and storage medium
Biasutti et al. Visibility estimation in point clouds with variable density
Seo et al. An efficient detection of vanishing points using inverted coordinates image space
Tanner et al. DENSER cities: A system for dense efficient reconstructions of cities
CN109816726A (en) A kind of visual odometry map updating method and system based on depth filter
Hu et al. 3D map reconstruction using a monocular camera for smart cities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd.

Address before: 430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone)

Applicant before: Wuhan Tuke Intelligent Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information