CN115205489A - Three-dimensional reconstruction method, system and device in large scene - Google Patents

Three-dimensional reconstruction method, system and device in large scene Download PDF

Info

Publication number
CN115205489A
CN115205489A CN202210630432.5A CN202210630432A CN115205489A CN 115205489 A CN115205489 A CN 115205489A CN 202210630432 A CN202210630432 A CN 202210630432A CN 115205489 A CN115205489 A CN 115205489A
Authority
CN
China
Prior art keywords
image
matching
reconstruction
dimensional
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210630432.5A
Other languages
Chinese (zh)
Inventor
梁凌宇
邹朝军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhongsi Artificial Intelligence Technology Co ltd
Original Assignee
Guangzhou Zhongsi Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhongsi Artificial Intelligence Technology Co ltd filed Critical Guangzhou Zhongsi Artificial Intelligence Technology Co ltd
Priority to CN202210630432.5A priority Critical patent/CN115205489A/en
Publication of CN115205489A publication Critical patent/CN115205489A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • G06T17/205Re-meshing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional reconstruction method under a large scene, which comprises the following steps: acquiring image data of a reconstructed target through RGB image acquisition equipment, and preprocessing the image data; retrieving and matching the images, calculating the characteristic points of each image, and matching the characteristic points; calculating the corresponding camera pose of each image; obtaining a dense point cloud intermediate model of the scene according to the image and the corresponding camera pose; post-processing the three-dimensional point cloud model to finally obtain a three-dimensional reconstruction grid model; the method and the device solve the problems of overlong image data matching time, low precision of partial scenes and insufficient integrity in large scenes; in addition, the invention also provides a three-dimensional reconstruction system and a device, wherein the system is used for realizing the reconstruction method, and the device is used for deploying the system; the method, the system and the device can realize three-dimensional reconstruction in a large scene, realize the separation of image acquisition and three-dimensional reconstruction calculation, and have wide application possibility.

Description

Three-dimensional reconstruction method, system and device in large scene
Technical Field
The invention relates to the technical field of image processing, in particular to a three-dimensional reconstruction method, a three-dimensional reconstruction system and a three-dimensional reconstruction device in a large scene.
Background
Three-dimensional reconstruction techniques refer to the reconstruction of real-world scenes or objects into computer-expressible and processable data models using computer technology. Three-dimensional reconstruction based on RGB images is increasingly widely applied due to the low requirements on acquisition equipment and the low cost of the reconstruction process. At present, the three-dimensional reconstruction technology based on the RGB image can be mainly divided into two steps, including sparse reconstruction based on a motion recovery structure technology and dense reconstruction based on a multi-view stereo technology.
The motion recovery structure technology is used for recovering the pose of the camera. The extraction and matching of the feature points are key steps, the SIFT features are mainly adopted in the industry to extract and describe the feature points at present, and although the SIFT features have illumination and scale invariance, the SIFT features are well represented in scenes with rich textures. However, for pure color scenes such as wall surfaces and floors, the features are difficult to extract, and the reconstruction effect is poor.
The multi-view stereo reconstruction technology is to recover dense point cloud of a scene, and at present, the technology can be divided into multi-view stereo reconstruction algorithms based on point cloud diffusion, voxel and depth map. The point cloud diffusion method limits the parallel capability of calculation in the propagation process, and the speed is very low; voxel reconstruction occupies a large amount of memory space, and memory consumption is unacceptable under the requirement of large-scene application; the point cloud reconstruction method based on the depth map estimates the depth map of each picture, then performs depth fusion, and decouples the MVS task into the depth estimation task of each view, so that the method is very suitable for large-scale scene reconstruction. In order to ensure the feasibility and efficiency of large scene reconstruction, the reconstruction method is also based on the depth map. The multi-view stereo reconstruction technology has poor performance in some scenes including wall surfaces, glass and the like, and the reconstruction precision and integrity are insufficient, which is a problem to be solved urgently.
In a large scene, tens of thousands of RGB images are needed for reconstruction, and the efficiency of the reconstruction process is obviously reduced. How to improve reconstruction efficiency under the requirement of large scene reconstruction is another problem to be solved.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a three-dimensional reconstruction method, a three-dimensional reconstruction system and a three-dimensional reconstruction device in a large scene, and solves the problems of overlong image data matching time, low precision of partial scenes and insufficient integrity in the large scene.
In order to achieve the purpose, the invention provides a three-dimensional reconstruction method under a large scene, which comprises the following steps:
s1, acquiring image data of a reconstructed target through RGB image acquisition equipment, and preprocessing the image data;
s2, retrieving and matching the images, calculating the characteristic points of each image, and matching the characteristic points;
s3, calculating the corresponding camera pose of each image;
s4, obtaining a dense point cloud intermediate model of the scene according to the image and the corresponding camera pose;
and S5, post-processing the three-dimensional point cloud model to finally obtain a three-dimensional reconstruction grid model.
Preferably, the three-dimensional reconstruction method includes the steps of:
(1) RGB image data of a scene is acquired by an image capture device, which may include various formats of pictures or videos. For video data, the video is subjected to self-adaptive sampling according to the length and quality of the video to obtain a scene picture. Obtaining a set of images of a scene I = { I = { i |i=1,2,...,N}。
(2) By adopting an image retrieval technology based on a deep neural network NetVLAD, a vector is extracted from each picture in an image set I and is used as a global descriptor of the picture
Figure BDA0003679363630000031
Rapidly matching pictures of similar view angles according to a global descriptor, and acquiring a matched image pair C = { { I { (I) } a ,I b }|I a ,I b ∈I,a<b}。
(3) Extracting and describing feature points based on the SuperPoint of the deep neural network to obtain the representation of the feature points
Figure BDA0003679363630000032
Where x is the key point and d is the descriptor.
Firstly, a convolutional neural network structure based on VGG is used as an encoder, downsampling and encoding are carried out on an image with the scale of W multiplied by H, and the image is obtained
Figure BDA0003679363630000033
A characteristic diagram of (2); and respectively obtaining the key point representation and the descriptor representation of the characteristic points through the parallel key point decoder and the descriptor decoder. The keypoint decoder convolves the feature map to obtain 65 channels, which correspond to 64 regional channels and 1 channel of irrelevant keypoints. Finally, obtaining a W multiplied by H multiplied by 1 characteristic diagram through Softmax and Reshape. The descriptor decoder adopts a full convolution structure similar to UCN to realize more accurate geometric and semantic information, and then obtains a description matrix of W multiplied by H multiplied by D through bicubic interpolation and L2 normalization, and the description matrix corresponds to the D-dimensional descriptor D of each key point x.
(4) Feature point matching is achieved through SuperGlue based on graph convolution neural networks. In the image matching pair C a,b ={I a ,I b On the points, obtain the matching pair A of the characteristic points a,b ={A(F i ,F j )|F i ∈I a ,F j ∈I b }。
For a certain feature point F i =(x j ,d j ) And merging the positions of the characteristic points and the description information by adopting a multilayer perceptron (MLP):
y i =d i +MLP(x i )
all subsequent processing will use y i As a representation of the feature points.
And (4) carrying out feature point aggregation in and among images by adopting an attention-based image convolution neural network. Wherein the nodes of the graph are characteristic points y i The edges of the graph are divided into two types, namely an edge epsilon connecting characteristic points in the image s And an edge ε connecting feature points between images c 。ε s Neighbor information, epsilon, to reflect feature points c To describe the similarity of different feature points between images. By passingNeighbor features of the multilayer GNN aggregation feature points and similarity information of the feature points among the images are aggregated, and finally the feature point y is obtained i Is described by i
Using the matching description z, in two matching images C a,b And constructing an allocation matrix for all the feature points x. The distribution matrix is iteratively optimized through a Sinkhorn algorithm to realize two images C a,b ={I a ,I b Feature point match A between a,b ={A(F i ,F j )|F i ∈I a ,F j ∈I b }。
(5) And performing camera pose estimation and scene reconstruction of sparse point cloud by using a motion recovery structure technology. Initializing, image registration, triangularization and beam adjustment optimization are carried out on the basis of the incremental SfM, and a camera pose and a sparse point cloud of a scene corresponding to each picture are obtained. The method comprises the following steps:
and (5) initializing. According to matching feature points
Figure BDA0003679363630000041
Number and distribution selection in images of initially matching image pairs C init And selecting the image pair with the most matching points and the most uniform distribution as much as possible. The relative pose of the camera is calculated using epipolar geometric constraints, including the rotation matrix R and the translation vector t. Obtained mainly by solving the following epipolar geometric constraint equation:
Figure BDA0003679363630000042
and (5) triangularization. Known camera (relative) poses R, t and matching point position x i 、x j The following relationship holds:
Z i x j ×Rx i +x i ×t=0
solving the equation by matching the coordinates of the characteristic point pairs to obtain the characteristic points x i Depth Z of i And obtaining the position of the three-dimensional space. After initialization and triangularization, an initial model M is obtained containing only two images init
And (4) image registration. Registering the remaining images to the initial model M using the PnP algorithm init In (1). The coordinates of the three-dimensional points in the camera coordinate system are obtained by utilizing the space similar geometric relation, and the coordinates { P ] of the n three-dimensional points in the space coordinate system are obtained k I k =1,2,. N } and coordinates { P ] in the camera coordinate system k ' | k =1, 2., n }, and finally solving the pose of the camera by using an iterative closest point algorithm (ICP). I.e. to optimize the following problem
Figure BDA0003679363630000051
Beam adjustment optimization (BA). And the position of the three-dimensional point and the camera parameters are adjusted, so that the error of the three-dimensional point re-projection into the image is minimized. I.e. to optimize the following cost function:
Figure BDA0003679363630000052
where Φ (K, R, t) is a camera parameter, P is a world coordinate of a three-dimensional point in space, h (·) is a projection function, and P is a pixel coordinate of the three-dimensional point in space in an image. By adjusting Φ and P, the above reprojection cost is minimized.
(6) Estimating a depth map D for each image by using camera parameters phi (K, R, t) obtained in the motion recovery structure process and relying on an MVS algorithm based on deep learning, and carrying out depth map { D on the depth map i I =1,2,. The, N } is fused to obtain a dense point cloud model M of a large scene dense
In the process of extracting the image features, the convolutional neural network is adopted to extract the image features. Adaptively adjusting a CNN (CNN) receptive field and aggregation weight according to the richness of the texture of the reconstructed target surface, adjusting the receptive field by adopting deformable convolution, and adjusting the aggregation weight by adopting adaptive weight migration, wherein the formula is as follows:
Figure BDA0003679363630000053
change Δ o according to the current characteristic k By adaptively adjusting the size and position of the convolution kernel, change Δ w k And adaptively adjusting the weighting weight so as to acquire the depth feature which is more beneficial to subsequent stereo matching.
After the features are acquired, randomly sampled depth hypotheses { d ] are generated for each feature pixel k |k=1,2,..,N d }. Using differentiable homography transformation, reference is made to feature map F ref The distortion is transformed to N adjacent thereto s Zhang Source signature Pattern { F i ,i=1,2,...,N s On, for F ref A certain characteristic pixel p in (b), which is assumed to be d at depth k Can be projected to the source feature image F by i
Figure BDA0003679363630000061
Obtaining a 3D cost body according to the distance between the features, and aiming at a source image F i At depth hypothesis d k The following costs are generated using the two-norm of the feature difference:
C i (p,d k )=||F i [p i (p,d k )]-F ref (p)|| 2
using adaptive weights w (C) i (d k ) To N in pair s The 3D cost volumes of the different views are aggregated:
Figure BDA0003679363630000062
performing Softmax on the cost body to generate a probability body:
P=softmax(C)
and carrying out weighted average on the depth hypothesis by using a probability body to obtain a final depth map:
Figure BDA0003679363630000063
and constructing a scene dense point cloud model through the depth map. For a certain pixel P in an image, obtaining a coordinate P of a 3D point in a real space according to a camera parameter t (R, t), K and a depth map D:
P=D(p)Τ -1 K -1 p
all 3D points jointly form a dense point cloud model M dense
(7) Post-processing, dense point clouds M of the scene dense And surface reconstruction, surface optimization and texture mapping are performed to obtain a three-dimensional model M with a good visual effect.
The invention also provides a three-dimensional reconstruction system for realizing the three-dimensional reconstruction method under the large scene, which comprises the following steps:
the terminal equipment comprises an image acquisition module; the image acquisition module acquires RGB pictures or video data of a scene through image acquisition equipment;
the cloud device comprises an image preprocessing module; the image preprocessing module acquires scene RGB (red, green and blue) pictures or video data uploaded to the cloud equipment through the communication module and preprocesses the data;
the cloud device further comprises an image retrieval and matching module; the image retrieval and matching module is connected with the image preprocessing module and is used for rapidly matching the large-scale images by using the computing function of the cloud equipment;
the image feature extraction and matching module is connected with the image retrieval and matching module, and is used for extracting feature points from the image by using the computing function of the cloud equipment so as to realize rapid and accurate matching of the feature points;
the reconstruction module is connected with the image feature extraction and matching module, and carries out camera pose estimation and dense point cloud reconstruction on the image subjected to feature point extraction and matching by using the computing function of the cloud equipment;
the post-processing module is connected with the reconstruction module, and surface reconstruction and optimization and texture mapping are carried out on the reconstructed dense point cloud by using the computing function of the cloud equipment, so that the visual effect of the model is optimized; obtaining a reconstruction model with a good visual effect;
and the communication module is used for communication between the terminal equipment and the cloud equipment.
The invention also provides a three-dimensional reconstruction device for realizing the three-dimensional reconstruction system in the large scene, which comprises the following components: the system comprises terminal equipment and cloud equipment;
the terminal device is used for acquiring and storing image data of a reconstructed scene, and comprises a terminal device memory and a communication module, wherein the terminal device memory is used for storing acquired images, and the communication module is used for communicating with cloud equipment;
the cloud equipment is used for performing three-dimensional reconstruction according to the image data; the cloud device at least comprises a cloud device memory, a processor and a communication module;
the cloud device storage is used for storing a computer program for realizing the three-dimensional reconstruction method in the large scene according to any one of claims 1 to 8 and intermediate data; the processor comprises a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU);
the processor is used for calling computer programs and data in a cloud device memory to realize the three-dimensional reconstruction method;
the communication module is used for being in communication connection with the terminal equipment.
Compared with the prior art, the invention has the beneficial effects that:
the invention preprocesses the image data of the obtained reconstruction target, thereby realizing the reconstruction of input pictures and videos with various formats; by adopting the image retrieval technology, the scene picture set can be effectively and globally described, similar scene images can be quickly found and matched according to the global descriptor, and a matched image pair is obtained. In the application of large-scene three-dimensional reconstruction, the image retrieval and matching technology provided by the embodiment of the application can effectively reduce the matching time while ensuring the accuracy; when the matching pair of the images is established, the quick and accurate matching pair generation is realized by using the image retrieval technology from the viewpoint of the visual similarity, so that the matching modes such as detailed matching, space matching and the like are avoided, the matching process does not depend on additional space information data such as POS (point of sale) and the like, and the scene image acquired from the image acquisition equipment without positioning and navigation can still be quickly and accurately matched and reconstructed; the problems of overlong image data matching time, low precision of partial scenes and insufficient integrity in large scenes are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic step diagram of a three-dimensional reconstruction method in a large scene provided by the present invention;
fig. 2 is a flowchart illustrating a three-dimensional reconstruction method in a large scene according to an embodiment of the present invention:
FIG. 3 is a schematic diagram of a three-dimensional reconstruction system in a large scene according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a three-dimensional reconstruction apparatus in a large scene according to an embodiment of the present invention.
The figure comprises the following components:
20. a terminal device; 201. an image acquisition module; 303. an image acquisition device; 21. cloud equipment; 202. an image preprocessing module; 207. a communication module; 203. an image retrieval and matching module; 204. an image feature extraction and matching module; 205. a reconstruction module; 206. a post-processing module; 304. a terminal device memory; 307. a cloud device memory; 306. a processor.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are one embodiment of the present invention, and not all embodiments of the present invention. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
Example one
Referring to fig. 1 and fig. 2, a three-dimensional reconstruction method in a large scene is provided in an embodiment of the present invention.
Fig. 2 shows a three-dimensional reconstruction method in a large scene according to an embodiment. The method comprises the following specific steps:
s101: and acquiring image data of the reconstructed target scene by the RGB image acquisition equipment. The captured image data supports a variety of formats including pictures or video.
The image acquisition equipment is any equipment for acquiring RGB images; including but not limited to smart phones, cameras, drones, etc. with image capture capabilities.
When the image acquisition equipment carries out the acquisition task of the target scene image, the image should contain all visible parts of the scene to be reconstructed, and the scene part which does not acquire RGB image data cannot be subjected to three-dimensional reconstruction. In the acquisition process, the acquisition angles should be diversified as much as possible, and scenes in different images should have certain overlapping performance. When the collection angle is changed, the collection angle is converted to the greatest extent through translation, and rotation is avoided. Each acquired image is prevented from only containing a single solid color surface, and the reconstruction effect can be obviously enhanced by adjusting the angle or zooming distance to contain rich texture parts of a scene or adding posters, objects and the like on the solid color surface. Finally, the visual effect of the texture mapping depends on the resolution of the acquired image data, and the higher the resolution, the better the visual effect.
S102: and preprocessing the image according to the acquired image format. And if the format of the acquired data is a video, sampling the video. Setting sampling picture quantity threshold value N during sampling min ,N max ]And adjusting the sampling rate r according to the length of the video s And the frame number after final sampling is within the threshold range. Obtaining scene picture set I = { I = } i |i=1,2,...,N}。
S103, retrieving and matching all the images to obtain image matching pairs. The image retrieval uses a deep neural network-NetVLAD to carry out scene recognition, and matches the images of similar scenes to obtain a plurality of one-to-one image matching pairs. The image retrieval and matching method employed in this example is as follows:
wherein, the image retrieval technology is adopted, so that the scene picture set I can be effectively and globally described
Figure BDA0003679363630000101
Finding similar scene images according to the global descriptor and matching the scene images to obtain a matching image pair C = { { I { } a ,I b }|I a ,I b Belongs to I, and a is less than b. In the application of large-scene three-dimensional reconstruction, the image retrieval and matching technology provided by the embodiment of the application can effectively reduce the matching time while ensuring the accuracy.
In this example, a method for extracting a picture global descriptor based on NetVLAD is used, and the steps are as follows: firstly, extracting features of a scene image I through the VGG16, and clustering the features by using a NetVLAD layer to obtain a VLAD vector, which is a global descriptor G =ofthe image, thereby obtaining a scene representation of the image. And matching the similar clustered images to obtain a one-to-one image matching pair C. It is worth noting that the clustering center also participates in the network training, so that the clustering center can show the semantic property, and the effect is better compared with the traditional clustering method.
Compared with the traditional retrieval method, the image retrieval method based on deep learning has better performance in some environments.
For a large number of unordered images, the time advantage of using image retrieval techniques is significant compared to traditional exhaustive matching. Compared with some space matching methods with higher speed, the method has the advantages that the requirement of space POS information on the input image data is not required, and the reconstruction task of common image acquisition equipment such as a mobile phone can be met. Meanwhile, this step is also the core to reduce the image matching time.
S104: for scene picture set I = { I i I =1, 2.. An, N } extraction feature point
Figure BDA0003679363630000111
Where x is the key point and d is the descriptor. The deep neural network-SuperPoint is adopted for feature point extraction and description. The feature point extraction and description method of the present example includes:
using a convolutional neural network structure based on VGG as an encoder, down-sampling and encoding the W multiplied by H image to obtain
Figure BDA0003679363630000112
The feature map of (2) reduces the amount of calculation and saves the calculation time.
Two decoders are connected in parallel, the keypoint decoder and the descriptor decoder obtain the representation of the keypoints and the descriptors of the feature points. The keypoint decoder first convolves the current feature map to obtain feature maps for 65 channels, corresponding to 64 regional channels and one keypoint-free channel. Finally obtaining a W multiplied by H multiplied by 1 characteristic diagram through Softmax and Reshape. The descriptor decoder adopts a full convolution structure similar to UCN to realize more accurate extraction of geometric and semantic information, and then obtains a description matrix of W multiplied by H multiplied by D through bicubic interpolation and L2 normalization, and the D-dimensional descriptor D corresponds to each key point x.
S105, based on the matching image pair C obtained in S102, performing image matching on any image pair C a,b ={I a ,I b Feature point matching is performed between, where feature point F = from S104. The feature point matching algorithm adopted in the embodiment of the application adopts SuperGlue based on graph convolution neural network to realize one-to-one matching A of feature points between image pairs a,b ={A(F i ,F j )|F i ∈I a ,F j ∈I b }. The feature point matching method of the present embodiment mainly includes:
and merging the information of the characteristic points. For a certain feature point (x) j ,d j ) And merging the positions of the feature points and the description information by adopting a multilayer perceptron:
y i =d i +MLP(p i )
y i can be used as a representation of the feature points.
And (4) carrying out feature point aggregation in the image and between the images by adopting an attention-based graph convolution neural network. Wherein the nodes of the graph are characteristic points y i The edges of the graph are divided into two types, namely an edge epsilon connecting characteristic points in the image s And an edge ε connecting feature points between images c 。ε s Neighbor information, epsilon, to reflect feature points c To describe the similarity of different feature points between images. Finally obtaining the feature point y by aggregating the neighbor features of the feature points and the similarity information of the feature points among the images through the multilayer GNN i Is described by i
And (5) an optimal matching layer. And converting the matching problem into an optimal distribution problem, and solving the feature point matching problem by solving an optimal distribution matrix. Describing vector z by matching i Obtaining a similarity score S by the inner product between the images, constructing a distribution matrix, optimizing the optimal distribution matrix by using a Sinkhorn algorithm, and realizing two images C a,b ={I a ,I b Feature point match A between a,b ={A(F i ,F j )|F i ∈I a ,F j ∈I b }。
S106: based on the matching image and the feature points solved in S105, performing sparse point cloud model M on the three-dimensional scene through a motion recovery structure (SfM) sparse And (4) reconstructing, and simultaneously obtaining the camera parameters phi (K, R, t) of each picture.
The motion recovery structure technology adopted in this embodiment is incremental SfM, which mainly includes initialization, image registration, triangulation, and beam adjustment optimization. The method comprises the following specific steps:
(1) And (5) initializing. According to matching feature points
Figure BDA0003679363630000131
Number and distribution selection in images of initially matching image pairs C init . The initialization greatly affects the final reconstruction effect and integrity, so that when an initial image pair is selected, the matching number of the feature points obtained in the step S105 and the distribution uniformity in the image are comprehensively considered, and the image pairs with more number and more uniform distribution are selected for sparse reconstruction.
The relative pose of the camera is calculated using epipolar geometric constraints, including the rotation matrix R and the translation vector t. Obtained mainly by solving the following epipolar geometric constraint equation:
Figure BDA0003679363630000132
(2) And (5) triangularization. Matching pairs A (F) through camera parameters phi (K, R, t) and characteristic points i ,F j ) And calculating the position of the three-dimensional point in the space, and reconstructing the three-dimensional point. Known camera poses R, t and matching point position x i 、x j The following relationship holds:
Z i x j ×Rx i +x i ×t=0
solving the equation by matching the coordinates of the characteristic points to obtain the characteristic points x i Depth Z i And obtaining the position of the three-dimensional space.
Obtaining an initial model M containing only two pieces of image information through initialization and triangulation init
(3) And (4) image registration. Will be after initialization in the initial model M init And registering all images in a middle loop. During a certain registration process, assume that the current model is M now . First choose to observe more models M now The image of the reconstructed 3D point is used as the next registration image I next . Computing I Using PnP next Corresponding camera pose (R, t). The PnP process is to use a geometrical relation similar to the space to obtain the coordinates of the three-dimensional points in a camera coordinate system, and the coordinates { P) of the n three-dimensional points in the space coordinate system k I k =1,2,. N } and coordinates { P under the camera coordinate system k ' | k =1, 2., n }, and the camera pose is solved by using an iterative closest point algorithm (ICP). I.e. to optimize the following problem:
Figure BDA0003679363630000141
converting the feature points in the image into 3D coordinates through triangulation again and adding the coordinates to the scene model M now In the step (1), the first step,complete image I next And (4) registering.
(4) Bundle Adjustment optimization (BA). The incremental reconstruction process accumulates errors, which causes a drift problem of the scene. Therefore, in the image registration process, after a certain number of cycles, the reprojection error is reduced through the beam adjustment optimization, the camera pose and the three-dimensional scene coordinate point are optimized, and the accuracy of the calculation result is improved.
By adjusting the current model M now The position of the medium 3D point and the camera parameters, so that the error in the re-projection of the three-dimensional point into the image is minimized. I.e. optimizing the following cost function:
Figure BDA0003679363630000142
where Φ (K, R, t) is the camera parameter and P is the model M now World coordinates of the medium three-dimensional point, h (-) is a projection function, and p is a model M now The medium 3D point P corresponds to the pixel coordinates in the image. By adjusting the camera parameters Φ and the spatial points P, the above re-projection cost is minimized.
Finally obtaining a sparse point cloud model M sparse And the camera parameters Φ for each picture.
S107: reconstructing a dense point cloud model M of a scene based on the image set I and the corresponding camera parameters Φ obtained from S106 dense . The dense point cloud reconstruction method adopted in the embodiment is an MVS network based on deep learning. Similar to the learning-based classical MVS network MVSNet, this example is equally divided into the following four modules: feature extraction, cost body creation, cost body regularization and depth regression.
In the process of extracting image features, a deformable convolution strategy is adopted, a CNN receptive field is increased in a low texture area, and the receptive field is reduced in a texture rich area, so that the reconstruction integrity of scenes such as a wall surface, a pure-color floor and the like in the low texture area is improved. In the process of constructing the cost body, aggregation of costs corresponding to images with different visual angles needs to be performed. Due to the illumination problem caused by occlusion and non-lambertian planes, the embodiment performs weighted aggregation by using the learned adaptive weight instead of simple averaging during cost aggregation. According to the method and the device, the reconstruction effect optimization of the low texture, the shielding and the non-Lambert plane is realized through the strategy. The specific process is as follows:
the input image is 1 reference image I ref And N s Sheet source image I source ={I i ∈Ω ref |i=1,2,...,N s }. Wherein Ω is ref Representing the spatial and reference images I ref A set of images captured by cameras whose corresponding cameras have a spatial neighboring relationship, which is determined from the camera pose (R, t) obtained in S106.
All N s Feature map of +1 images
Figure BDA0003679363630000151
Extracted by an encoder (convolutional neural network) with shared weights. In the process of extracting image features, in order to enhance the reconstruction effect in the low-texture region, by using deformable convolution, the receptive field of 2DCNN can be adaptively increased in the low-texture region, and the weighted aggregation process is adjusted by using weight offset, which is expressed as follows:
Figure BDA0003679363630000152
change of Δ o according to characteristics k By adaptively adjusting the size and position of the convolution kernel, change Δ w k The weighting weight is adaptively adjusted, so that depth features which are more beneficial to subsequent stereo matching are obtained.
After the features are acquired, randomly sampled depth hypotheses { d ] are generated for each feature pixel k |k=1,2,..,N d As N d A hypothetical depth plane. Using differentiable homography, the feature map F is referenced ref The distortion is transformed to N adjacent thereto s Zhang Source signature Pattern { F i ,i=1,2,...,N s On, for F ref A certain characteristic pixel p in the depth hypothesis plane d k Can be projected to the source feature image F by i The method comprises the following steps:
Figure BDA0003679363630000161
through the above transformation, the source characteristic image I source Warping to reference profile I ref To construct 3D cost body C = { C i |i=1,2,..,N s }. According to the similarity of the features, for the source image F i At depth hypothesis d k Generating a cost body by using a two-norm of the characteristic difference:
C i (p,d k )=||F i [p i (p,d k )]-F ref (p)|| 2
finally, N is required to be s Aggregation of 3D cost bodies by using adaptive weight w (C) i (d k ) Cost aggregation of different views):
Figure BDA0003679363630000162
performing Softmax on the cost body to generate a probability body:
P=softmax(C)
and carrying out weighted average on the depth hypothesis by using a probability body to obtain a final depth map:
Figure BDA0003679363630000163
and constructing a scene dense point cloud model through the depth map. For a certain pixel P in an image, obtaining a coordinate P of a three-dimensional point in a real space according to a camera parameter Φ (t, K) and a depth map D:
P=D(p)Τ -1 K -1 p
and K is camera internal reference, and T is a camera pose and comprises a rotation matrix R and a translation vector T.
After the depth maps of all images are obtained through the MVS based on the deep learning in the embodiment of the application, all the depth maps are processedFiltering and fusing to obtain dense point cloud M of scene dense .
S108: and (4) performing post-processing on the dense point cloud model obtained in the step (S107), specifically comprising surface reconstruction, surface optimization and texture mapping, and finally obtaining a three-dimensional reconstruction model with a good visual effect. The method comprises the following steps:
calculating normal parameters of the point cloud, and performing surface triangular mesh reconstruction to obtain a triangular mesh intermediate model;
carrying out mesh optimization based on the triangular mesh model to obtain a refined mesh model;
based on the grid model and the collected images, an optimal visual angle image is selected for each grid, pixels of the images are filled on the surface of the grid, and a reconstruction model with a good visual effect is obtained.
In the embodiment of the disclosure, the image retrieval technology is mainly adopted to realize the rapid pairing of images aiming at a large scene with more reconstructed image data; based on the matching method of the SuperPoint characteristic points and the SuperGlue characteristic points, the speed and the precision of characteristic point matching are improved, so that the precision of camera pose calculation is effectively improved; then, the integrity of reconstruction is improved through an MVS algorithm based on deep learning; and finally, optimizing the point cloud model through a series of post-processing to obtain a final three-dimensional reconstruction model.
According to the invention, the combination of SuperPoint and SuperGlue is adopted in the process of extracting the image feature points and matching, compared with the mainstream SIFT feature and nearest neighbor matching at present, the method can provide more effective matching in some environments, and greatly improves the accuracy of camera pose calculation and the integrity of final reconstruction.
In the dense reconstruction process, the depth map estimation is carried out by adopting the MVS network based on the deep learning, and because the convolutional neural network is used when the characteristics are extracted, the extracted characteristics have global property to a certain extent, compared with the traditional method, the reconstruction deficiency in the weak texture area can be reduced to a certain extent.
Example two
Referring to fig. 3, a second embodiment of the present invention provides a three-dimensional reconstruction system using the three-dimensional reconstruction method in the first embodiment in a large scene.
Referring to fig. 3, the three-dimensional reconstruction system includes: terminal device 20 and cloud device 21.
The terminal device 20, the terminal device 20 includes an image acquisition module 201; the image capture module 201 captures RGB picture or video data of a scene via an image capture device 303.
The cloud device 21 comprises an image preprocessing module 202, an image retrieving and matching module 203, an image feature extracting and matching module 204, a reconstruction module 205 and a post-processing module 206.
The image preprocessing module 202 acquires scene RGB pictures or video data uploaded to the cloud device 21 through the communication module 207, and preprocesses the data; specifically, the image preprocessing module 202 is configured to execute the image preprocessing algorithm of step S102 in the first embodiment.
The image retrieval and matching module 203 is connected with the image preprocessing module 202, and performs fast matching on the large-scale image by using the calculation function of the cloud device 21; specifically, the image retrieving and matching module 203 is used for executing the image retrieving and matching algorithm of step S103 in the first embodiment, and the module accelerates the operation by using the GPU.
The image feature extraction and matching module 204 is connected with the image retrieval and matching module 203, and uses the computing function of the cloud device 21 to extract feature points from the image, so as to realize rapid and accurate matching of the feature points; specifically, the image feature extraction and matching module 204 is configured to execute the feature point extraction and matching algorithm of step S104 and step S105 in the first embodiment, and the module accelerates the operation by using the GPU.
The reconstruction module 205 is connected with the image feature extraction and matching module 204, and performs camera pose estimation and dense point cloud reconstruction on the image subjected to feature point extraction and matching by using the calculation function of the cloud device 21; specifically, the reconstruction module 205 is configured to perform the camera pose calculation and the dense point cloud reconstruction algorithm in steps S106 and S107 in the first embodiment, and the module accelerates the operation by using the GPU.
The post-processing module 206 is connected with the reconstruction module 205, and performs surface reconstruction and optimization and texture mapping on the reconstructed dense point cloud by using the computing function of the cloud device 21, so as to optimize the visual effect of the model; obtaining a reconstruction model with a good visual effect; specifically, the post-processing module 206 is configured to perform the surface reconstruction, surface optimization, and texture mapping algorithms of step S108 in the first embodiment, and the module accelerates the operation by using the GPU.
The communication module 207 is used for communication between the terminal device 20 and the cloud device 21. The module performs data transmission between the terminal device 20 and the cloud device 21 based on an internet communication protocol. The terminal device 20 uploads image data to the cloud device 21 through the communication module 207, and the cloud device 21 provides a downloading service for reconstructing the three-dimensional model to the terminal device 20 through the communication module 207.
Since the three-dimensional reconstruction system and the three-dimensional reconstruction method in the large scene correspond to each other, the embodiments of this section will not be specifically explained in comparison with the embodiments of the method section.
EXAMPLE III
Referring to fig. 4, a third embodiment of the present invention provides a three-dimensional reconstruction apparatus using the three-dimensional reconstruction system in the large scene according to the second embodiment.
Referring to fig. 4, the three-dimensional reconstruction apparatus includes: terminal device 20 and cloud device 21.
The terminal device 20 is configured to obtain and store an RGB image of a scene and store a generated three-dimensional reconstruction model, where the terminal device 20 includes a terminal device memory 304 and a communication module 207, where the image acquisition device 303 is configured to acquire scene image data; the terminal device memory 304 is used for storing image and model data; the communication module 207 is configured to transmit image data to the cloud device 21 through the internet and receive the three-dimensional reconstruction model generated by the cloud device 21.
The cloud device 21 is configured to reconstruct a three-dimensional model according to image data; the cloud device 21 at least includes a cloud device memory 307, a processor 306 and a communication module 207; the cloud device storage 307 is configured to store a computer program for implementing the three-dimensional reconstruction method in the large scene according to the first embodiment and intermediate data; the processor 306 comprises a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) with sufficient performance; the method for three-dimensional reconstruction under the large scene is supported to be realized; the processor 306 is configured to call the computer program and data in the cloud device memory 307 to implement the three-dimensional reconstruction method.
The communication module 207 is used for being in communication connection with the terminal device 20; specifically, the image data transmitted by the terminal device 20 is received through the internet and the generated three-dimensional reconstruction model is transmitted to the terminal device 20.
The invention also provides a system and a device for realizing the three-dimensional reconstruction in the large scene, wherein the system is used for realizing the reconstruction method, and the device is used for deploying the system.
The three-dimensional reconstruction method, the three-dimensional reconstruction system and the three-dimensional reconstruction device in the large scene can realize the separation of acquisition and calculation, the acquisition is realized by the terminal, and the cloud end carries out rapid calculation by using high-performance computing equipment. The method reduces the requirement on the computing performance of the terminal and enhances the possibility of wide application.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A three-dimensional reconstruction method under a large scene is characterized in that: the method comprises the following steps:
the method comprises the following steps of S1, acquiring image data of a reconstructed target through RGB image acquisition equipment, and preprocessing the image data;
s2, retrieving and matching the images, calculating the characteristic points of each image, and matching the characteristic points;
s3, calculating the corresponding camera pose of each image;
s4, obtaining a dense point cloud intermediate model of the scene according to the image and the corresponding camera pose;
and S5, post-processing the three-dimensional point cloud model to finally obtain a three-dimensional reconstruction grid model.
2. The method according to claim 1, wherein the method comprises the following steps: in the step S1: the image data is a multi-angle two-dimensional picture or video of a scene; preprocessing image data, adaptively sampling video, and adjusting sampling rate r according to video frame number and quality s Limiting the number of reconstructed pictures to the range [ N ] min ,N max ]To (c) to (d); obtaining a set of video frame pictures I = { I = (I) = suitable for reconstruction i I =1,2, ·, N }; the acquisition device is any means for RGB image acquisition.
3. The three-dimensional reconstruction method under the large scene according to claim 2, characterized in that: in the step S2: describing and matching all collected images by adopting an image retrieval technology based on deep learning; for each image of the image set I, the image is subjected to adaptive clustering and scene description by using the characteristics of the image
Figure FDA0003679363620000011
Finding images with high similarity according to similarity of scene descriptions, and forming a matching image pair C = { { I { (I) a ,I b }|I a ,I b ∈I,a<b}。
4. The method according to claim 3, wherein the method comprises the following steps: extracting and describing the characteristic points through a characteristic point extraction network based on the SuperPoint of the deep neural network to obtain the representation of the characteristic points
Figure FDA0003679363620000024
Where x is the key point and d is the descriptor.
5. A process according to claim 4The three-dimensional reconstruction method under the large scene is characterized by comprising the following steps: in the image matching pair C a,b ={I a ,I b On the basis, matching is completed on the feature points, and feature point matching pairs are constructed
Figure FDA0003679363620000021
And matching by adopting SuperGlue based on a graph convolution neural network, wherein the network aggregates characteristic point information in the image and between the images based on an attention mechanism, and the stability and accuracy of matching are enhanced.
6. The method for three-dimensional reconstruction in large scene according to claim 5, wherein: obtaining feature point matching pairs
Figure FDA0003679363620000022
Then, performing sparse point cloud M by using incremental SfM sparse The reconstruction of (a) and the estimation of camera parameters, the camera parameters including an external parameter T (R, T) and an internal parameter K; the method comprises the following steps:
(1) Initialization: according to matching feature points
Figure FDA0003679363620000023
Number and distribution selection in images of initially matching image pairs C init Selecting the image pair with the most matching points and the most uniform distribution according to the distribution scores; calculating the relative pose of the camera by using epipolar geometric constraint, wherein the relative pose comprises a rotation matrix R and a translation vector t;
(2) Triangularization: relative motion pose R, t through camera and matching point x i 、x j Calculating the position of the three-dimensional point in the space, and reconstructing the three-dimensional point:
Z i x j ×Rx i +x i ×t=0
by matching feature point coordinates x 1 、x 2 And pose R and t, solving the equation to obtain depth Z i Obtaining the position of the three-dimensional space; obtaining an initial model M containing only two images through an initialization and triangularization process init
(3) Image registration: registering the remaining images to the initial model M by PnP init Performing the following steps; the coordinates of the three-dimensional points under the camera coordinate system are obtained by utilizing the space similar geometric relation, and the coordinates { P ] of the n three-dimensional points under the space coordinate system are obtained k I k =1,2,. N } and coordinates { P ] in the camera coordinate system k ' | k =1,2,., n }, and solving the pose of the camera by using an iterative closest point algorithm ICP;
(4) Beam adjustment optimization BA: the position of the three-dimensional point and the camera parameters are adjusted, so that the error of the reconstructed 3D point in the image is minimum; i.e. optimizing the following cost function:
Figure FDA0003679363620000031
wherein phi is a camera parameter, P is a world coordinate of a three-dimensional point in the space, h (-) is a projection function, and P is a pixel coordinate of the three-dimensional point in the space corresponding to the image; by adjusting Φ and P, the above reprojection error is minimized.
7. The method according to claim 6, wherein the method comprises the following steps: estimating a depth map D of the image I by utilizing an MVS algorithm based on deep learning; further comprising:
in the process of extracting image features, deformable convolution is adopted to adaptively increase the receptive field of the 2D CNN in low texture areas, and a weight offset is used to adjust the weighted aggregation process, which is expressed as follows:
Figure FDA0003679363620000032
change of Δ o according to characteristics k To adaptively adjust the size and position of convolution kernel, change Δ w k The weighting weight is adjusted in a self-adaptive manner, so that the depth characteristic which is more beneficial to subsequent stereo matching is obtained, and the integrity of reconstruction is improved;
obtaining featuresGraph { F i After i =1, 2.. ·, N }, randomly sampled depth hypotheses { d } are generated for each feature pixel k |k=1,2,..,N d }; using differentiable homography transformation, reference is made to feature map F ref The distortion is transformed to N adjacent thereto s Zhang Yuan feature map { F i ,i=1,2,...,N s On, for F ref A certain characteristic pixel p in (b), which is assumed to be d in depth k Can be projected to the source feature image F by i
Figure FDA0003679363620000043
Obtaining a 3D cost body according to the similarity of the features, and aiming at a source image F i At depth hypothesis d k The following matching cost is expressed by using a two-norm of the feature difference:
C i (p,d k )=||F i [p i (p,d k )]-F ref (p)|| 2
using adaptive weights w (C) i (d k ) To N in pair s The 3D cost volumes of the different views are aggregated:
Figure FDA0003679363620000041
performing Softmax on the cost body to generate a probability body:
P=softmax(C)
and carrying out weighted average on the depth hypothesis by using a probability body to obtain a final depth map estimation:
Figure FDA0003679363620000042
constructing a dense point cloud model through a depth map; for a certain pixel P in the image, the coordinates P of a 3D point in real space can be obtained from the camera parameters t (R, t), K and the depth map D:
P=D(p)Τ -1 K -1 p
all 3D points jointly form a dense point cloud model M dense
8. The method according to claim 7, wherein the method comprises the following steps: three-dimensional point cloud model M in acquired scene dense Then, post-processing the point cloud to obtain a reconstruction model M, including:
carrying out surface triangular mesh reconstruction on the point cloud to obtain a triangular mesh intermediate model M mesh
Based on the grid model and the collected images, an optimal visual angle image is selected for each grid, pixels of the image are filled on the surface of the grid, and a reconstruction model M with a good visual effect is obtained.
9. A three-dimensional reconstruction system for implementing the three-dimensional reconstruction method in a large scene according to any one of claims 1 to 8, comprising:
a terminal device (20), the terminal device (20) comprising an image acquisition module (201); the image acquisition module (201) acquires RGB (red, green and blue) picture or video data of a scene through image acquisition equipment (303);
a cloud device (21), the cloud device (21) comprising an image pre-processing module (202); the image preprocessing module (202) acquires scene RGB (red, green and blue) pictures or video data uploaded to the cloud equipment (21) through the communication module (207), and preprocesses the data;
the cloud device (21) further comprises an image retrieval and matching module (203); the image retrieval and matching module (203) is connected with the image preprocessing module (202) and uses the computing function of the cloud equipment (21) to rapidly match the large-scale images;
the image feature extraction and matching module (204) is connected with the image retrieval and matching module (203), and the computing function of the cloud equipment (21) is used for extracting feature points from the image so as to realize rapid and accurate matching of the feature points;
the reconstruction module (205) is connected with the image feature extraction and matching module (204), and uses the computing function of the cloud equipment (21) to carry out camera pose estimation and dense point cloud reconstruction on the image subjected to feature point extraction and matching;
the post-processing module (206) is connected with the reconstruction module (205), and surface reconstruction and optimization and texture mapping are carried out on the reconstructed dense point cloud by using the calculation function of the cloud equipment (21), so that the visual effect of the model is optimized;
and the communication module (207) is used for communication between the terminal equipment (20) and the cloud equipment (21).
10. A three-dimensional reconstruction apparatus for implementing the three-dimensional reconstruction system in a large scene as claimed in claim 9, comprising: the system comprises a terminal device (20) and a cloud device (21);
the terminal device (20) is used for acquiring and storing image data of a reconstructed scene, the terminal device (20) comprises a terminal device memory (304) and a communication module (207), the terminal device memory (304) is used for storing acquired images, and the communication module (207) is used for communicating with a cloud device (21);
the cloud device (21) is used for performing three-dimensional reconstruction according to image data; the cloud device (21) at least comprises a cloud device memory (307), a processor (306) and a communication module (207);
the cloud device storage (307) is used for storing a computer program for implementing the three-dimensional reconstruction method in a large scene according to any one of claims 1 to 8 and intermediate data; the processor (306) comprises a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU);
the processor (306) is used for calling computer programs and data in the cloud device storage (307) to realize the three-dimensional reconstruction method;
the communication module (207) is used for being in communication connection with the terminal equipment (20).
CN202210630432.5A 2022-06-06 2022-06-06 Three-dimensional reconstruction method, system and device in large scene Pending CN115205489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210630432.5A CN115205489A (en) 2022-06-06 2022-06-06 Three-dimensional reconstruction method, system and device in large scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210630432.5A CN115205489A (en) 2022-06-06 2022-06-06 Three-dimensional reconstruction method, system and device in large scene

Publications (1)

Publication Number Publication Date
CN115205489A true CN115205489A (en) 2022-10-18

Family

ID=83576831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210630432.5A Pending CN115205489A (en) 2022-06-06 2022-06-06 Three-dimensional reconstruction method, system and device in large scene

Country Status (1)

Country Link
CN (1) CN115205489A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115578539A (en) * 2022-12-07 2023-01-06 深圳大学 Indoor space high-precision visual position positioning method, terminal and storage medium
CN115719407A (en) * 2023-01-05 2023-02-28 安徽大学 Distributed multi-view stereo reconstruction method for large-scale aerial images
CN115861546A (en) * 2022-12-23 2023-03-28 四川农业大学 Crop geometric perception and three-dimensional phenotype reconstruction method based on nerve body rendering
CN115937546A (en) * 2022-11-30 2023-04-07 北京百度网讯科技有限公司 Image matching method, three-dimensional image reconstruction method, image matching device, three-dimensional image reconstruction device, electronic apparatus, and medium
CN116258817A (en) * 2023-02-16 2023-06-13 浙江大学 Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN116310105A (en) * 2023-03-09 2023-06-23 广州沃佳科技有限公司 Object three-dimensional reconstruction method, device, equipment and storage medium based on multiple views
CN116503551A (en) * 2023-04-14 2023-07-28 海尔数字科技(上海)有限公司 Three-dimensional reconstruction method and device
CN116704111A (en) * 2022-12-08 2023-09-05 荣耀终端有限公司 Image processing method and apparatus
CN116934829A (en) * 2023-09-15 2023-10-24 天津云圣智能科技有限责任公司 Unmanned aerial vehicle target depth estimation method and device, storage medium and electronic equipment
CN116524111B (en) * 2023-02-21 2023-11-07 中国航天员科研训练中心 On-orbit lightweight scene reconstruction method and system for supporting on-demand lightweight scene of astronaut
CN117252996A (en) * 2023-11-20 2023-12-19 中国船舶集团有限公司第七〇七研究所 Data expansion system and method for special vehicle in cabin environment
CN117456130A (en) * 2023-12-22 2024-01-26 山东街景智能制造科技股份有限公司 Scene model construction method

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937546A (en) * 2022-11-30 2023-04-07 北京百度网讯科技有限公司 Image matching method, three-dimensional image reconstruction method, image matching device, three-dimensional image reconstruction device, electronic apparatus, and medium
CN115578539A (en) * 2022-12-07 2023-01-06 深圳大学 Indoor space high-precision visual position positioning method, terminal and storage medium
CN115578539B (en) * 2022-12-07 2023-09-19 深圳大学 Indoor space high-precision visual position positioning method, terminal and storage medium
CN116704111A (en) * 2022-12-08 2023-09-05 荣耀终端有限公司 Image processing method and apparatus
CN115861546B (en) * 2022-12-23 2023-08-08 四川农业大学 Crop geometric perception and three-dimensional phenotype reconstruction method based on nerve volume rendering
CN115861546A (en) * 2022-12-23 2023-03-28 四川农业大学 Crop geometric perception and three-dimensional phenotype reconstruction method based on nerve body rendering
CN115719407A (en) * 2023-01-05 2023-02-28 安徽大学 Distributed multi-view stereo reconstruction method for large-scale aerial images
CN116258817A (en) * 2023-02-16 2023-06-13 浙江大学 Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN116258817B (en) * 2023-02-16 2024-01-30 浙江大学 Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN116524111B (en) * 2023-02-21 2023-11-07 中国航天员科研训练中心 On-orbit lightweight scene reconstruction method and system for supporting on-demand lightweight scene of astronaut
CN116310105A (en) * 2023-03-09 2023-06-23 广州沃佳科技有限公司 Object three-dimensional reconstruction method, device, equipment and storage medium based on multiple views
CN116310105B (en) * 2023-03-09 2023-12-05 广州沃佳科技有限公司 Object three-dimensional reconstruction method, device, equipment and storage medium based on multiple views
CN116503551A (en) * 2023-04-14 2023-07-28 海尔数字科技(上海)有限公司 Three-dimensional reconstruction method and device
CN116934829A (en) * 2023-09-15 2023-10-24 天津云圣智能科技有限责任公司 Unmanned aerial vehicle target depth estimation method and device, storage medium and electronic equipment
CN116934829B (en) * 2023-09-15 2023-12-12 天津云圣智能科技有限责任公司 Unmanned aerial vehicle target depth estimation method and device, storage medium and electronic equipment
CN117252996A (en) * 2023-11-20 2023-12-19 中国船舶集团有限公司第七〇七研究所 Data expansion system and method for special vehicle in cabin environment
CN117252996B (en) * 2023-11-20 2024-05-10 中国船舶集团有限公司第七〇七研究所 Data expansion system and method for special vehicle in cabin environment
CN117456130A (en) * 2023-12-22 2024-01-26 山东街景智能制造科技股份有限公司 Scene model construction method
CN117456130B (en) * 2023-12-22 2024-03-01 山东街景智能制造科技股份有限公司 Scene model construction method

Similar Documents

Publication Publication Date Title
CN115205489A (en) Three-dimensional reconstruction method, system and device in large scene
Tateno et al. Distortion-aware convolutional filters for dense prediction in panoramic images
CN115690324A (en) Neural radiation field reconstruction optimization method and device based on point cloud
CN109377530B (en) Binocular depth estimation method based on depth neural network
Wang et al. 360sd-net: 360 stereo depth estimation with learnable cost volume
WO2020001168A1 (en) Three-dimensional reconstruction method, apparatus, and device, and storage medium
CN106228507B (en) A kind of depth image processing method based on light field
CN108010123B (en) Three-dimensional point cloud obtaining method capable of retaining topology information
CN103839277B (en) A kind of mobile augmented reality register method of outdoor largescale natural scene
CN108876814B (en) Method for generating attitude flow image
CN110070598B (en) Mobile terminal for 3D scanning reconstruction and 3D scanning reconstruction method thereof
TW202117611A (en) Computer vision training system and method for training computer vision system
CN110580720B (en) Panorama-based camera pose estimation method
CN111553845B (en) Quick image stitching method based on optimized three-dimensional reconstruction
CN111951368B (en) Deep learning method for point cloud, voxel and multi-view fusion
CN111899295B (en) Monocular scene depth prediction method based on deep learning
CN113192179A (en) Three-dimensional reconstruction method based on binocular stereo vision
CN112767467B (en) Double-image depth estimation method based on self-supervision deep learning
CN114119739A (en) Binocular vision-based hand key point space coordinate acquisition method
WO2021035627A1 (en) Depth map acquisition method and device, and computer storage medium
CN111402412A (en) Data acquisition method and device, equipment and storage medium
CN114782628A (en) Indoor real-time three-dimensional reconstruction method based on depth camera
CN116958437A (en) Multi-view reconstruction method and system integrating attention mechanism
WO2021142843A1 (en) Image scanning method and device, apparatus, and storage medium
CN116912405A (en) Three-dimensional reconstruction method and system based on improved MVSNet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination