WO2023280274A1 - Geometric structure aided visual localization method and system - Google Patents
Geometric structure aided visual localization method and system Download PDFInfo
- Publication number
- WO2023280274A1 WO2023280274A1 PCT/CN2022/104407 CN2022104407W WO2023280274A1 WO 2023280274 A1 WO2023280274 A1 WO 2023280274A1 CN 2022104407 W CN2022104407 W CN 2022104407W WO 2023280274 A1 WO2023280274 A1 WO 2023280274A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- real
- landmarks
- keyframe
- geometric structure
- map
- Prior art date
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 74
- 230000004807 localization Effects 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000002123 temporal effect Effects 0.000 claims abstract description 38
- 238000009826 distribution Methods 0.000 claims abstract description 20
- 239000000203 mixture Substances 0.000 claims abstract description 8
- 238000005259 measurement Methods 0.000 claims description 19
- 238000005457 optimization Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000005266 casting Methods 0.000 claims description 5
- 230000003287 optical effect Effects 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 8
- 230000004438 eyesight Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Definitions
- the present invention generally relates to visual localization, and more specifically, to a geometric structure aided visual localization method and system.
- GNSS/INS global navigation satellite system and inertial navigation system
- LiDAR three-dimensional light detection and ranging
- a 3D LiDAR device scans the scene structure which can be matched against a point cloud map built in advance.
- LiDAR scans it is not easy to recover the global location via recognize a place from existing database with LiDAR scans. This is because of the appearance information provided by a single scan is limited.
- GNSS/INS or LiDAR-based systems that have the capability to recover centimeter-level sensor poses are typically too expensive to be widely applied for navigation or other applications
- a geometric structure aided visual localization method for navigating a mobile system.
- the method comprises: retrieving a 3D map about an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks and obtaining a sequence of real-time image frames by a camera of a mobile system; for each real-time image frame, extracting one or more local features from the real-time image frame; predicting a camera pose corresponding to the real-time image frame; creating a keyframe by tracking the predicted camera pose in the 3D map; identifying one or more temporal visible landmarks with respect to the created keyframe; obtaining one or more two dimensions to three dimensions (2D-3D) correspondences between the local features and the temporal visible landmarks; and locating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D correspondences.
- a geometric structure aided visual localization system for navigating a mobile system.
- the system comprises a camera configured to obtain a sequence of real-time image frames; a first processor including a mapping module configured to retrieve a 3D map based on an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks and a second processor including a localization module configured to process each real-time image frame by: extracting one or more local features from the real-time image frame; predicting a camera pose corresponding to the real-time image frame; creating a keyframe by tracking the predicted camera pose in the 3D map; identifying one or more temporal visible landmarks with respect to the created keyframe; obtaining one or more 2D-3D correspondences between the local features and the temporal visible landmarks; and locating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D
- the camera pose is predicted through a visual-inertial odometry process comprising: obtaining real-time inertial measurement data by an inertial measurement unit of the mobile system; pre-integrating the real-time inertial measurement data into a single relative motion constraint; and estimating the camera pose based on the single relative motion constraint.
- the one or more temporal visible landmarks are identified by: dividing the 3D map into a set of voxels to form a voxelized map, wherein each mapped landmark being intersected with one or more voxels; casting a plurality of rays from an optical center of the created keyframe into the voxelized map; for each casted ray, locating one or more voxels along the casted ray; and identifying one or more mapped landmarks which are intersected with the located voxels as temporal visible landmarks with respect to the created keyframe.
- the provided visual localization method and system can take input data from a low-cost sensor suite, such as a visual-inertial sensor, and leverage other modalities, such as high-definition map generated by expensive LiDAR device and integrated navigation system.
- a low-cost sensor suite such as a visual-inertial sensor
- other modalities such as high-definition map generated by expensive LiDAR device and integrated navigation system.
- FIG. 1 shows a block diagram of a geometric structure aided visual localization system according to one embodiment of the present invention
- FIG. 2 shows exemplary sensor setup of the geometric structure aided visual localization system of FIG. 1;
- FIG. 3 shows a flowchart of a localization method according to one embodiment of the present invention
- FIG. 4 shows an exemplary visual-inertial bundle adjustment pipeline according to one embodiment of the present invention.
- FIG. 5 shows a factor graph representation for the visual-inertial bundle adjustment pipeline of FIG. 4.
- FIG. 1 shows a block diagram of a geometric structure aided visual localization system according to one embodiment of the present invention.
- FIG. 2 shows exemplary sensor setup for the visual localization system of FIG. 1.
- the visual localization system may be fully decoupled into a mapping module 11 and a localization module 12.
- the mapping module 11 may be operated offline in advance to generate an accurate localizable map, which is expected to be fully reusable for long-term localization.
- the prior localizable map can be built offline with more sophisticated sensors.
- the visual localization system may take input trajectories estimated from an integrated navigation system combining information from laser scanner, camera system and GNSS/INS (in functional block 102: integrated navigation) .
- both geometric and visual features are extracted from laser and visual data, respectively (in functional block 104: geometrical/visual feature extraction) .
- the geometric and visual features serve as map elements for constructing the localizable map.
- the features under local frames are transformed with the estimated poses and aggregated to form a localizable map (in functional block 106: map aggregation) .
- the localization module 12 takes input query image and/or inertial measurement data from a visual-inertial sensor suite.
- the visual-inertial sensor suite may include a camera system and an inertial measurement unit (IMU) .
- IMU inertial measurement unit
- For each input query image local features are extracted for representing the current frame (in functional block 108: local feature extraction) .
- the inertial measurement data from a last frame is pre-integrated for a relative transform and corresponding covariance (at the step 110: pre-integration) to estimate a prior pose.
- the local features are firstly matched and the last frame is tracked with the prior pose as the initial guess (in functional block 112: inter-frame tracking) .
- the local features are then associated with the map elements obtained from the localizable map to generate some landmarks (in functional block 114: local feature association) .
- the optimizable parameters are jointly adjusted for refinement of poses and inertial parameters (in functional block 116: joint optimization) .
- the mapping module 11 may include a high-standard sensor suite including, but not limited to, laser scanners 22, camera system 24 and GNSS/INSS 26 for obtaining high accuracy and density data to construct the prior localizable and reusable map.
- the localization module 12 may include an inexpensive visual-inertial (VI) sensor 28 (e.g., VI sensors equipped on a smart phone) to obtain visual data and inertial measurement data such that the cost can be reduced to the maximum extent.
- VI visual-inertial
- the mapping module 11 and the localization module 12 may be implemented with one or more processors (not shown) which may be CPU, MCU, application specific integrated circuits (ASIC) , field programmable gate arrays (FPGA) or any suitable programmable logic devices configured or programmed to be a processor for performing the mapping and localization.
- the localization system may further include a memory unit (not shown) which may include a volatile memory unit (such as RAM) , a non-volatile unit (such as ROM, EPROM, EEPROM and flash memory) or both, or any type of media or devices suitable for storing instructions, codes, and/or data.
- FIG. 3 shows a flowchart of a visual localization method according to one embodiment of the present invention.
- the visual localization method may comprise:
- S301 retrieving a 3D map about an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks
- S302 obtaining a sequence of real-time image frames by a camera of a mobile system
- S308 locating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D correspondences.
- the visual structure may be constructed by leveraging a simultaneous localization and mapping (SLAM) or structure from motion (SfM) method, where the landmarks are filtered by the number of frames that can observe them.
- the geometric structure may be constructed using a point cloud representing shapes of landmarks in the 3D map.
- the Gaussian mixture model is expressed and parameterized as:
- x i 3D position of a i-th mapped landmark, is a likelihood of the landmark position x i when a j-th Gaussian distribution is given;
- w j is the weight for the j-th Gaussian distribution for each landmark, ⁇ j and ⁇ j are mean and covariance of the j-th Gaussian distribution respectively.
- the camera pose may be predicted through a visual-inertial odometry process including: obtaining real-time inertial measurement data by an inertial measurement unit of the mobile system; pre-integrating the real-time inertial measurement data into a single relative motion constraint; and estimating the camera pose based on the single relative motion constraint.
- the camera pose may be predicted through a place recognition process including: retrieving a plurality of keyframe candidates with a global descriptor based on the initialization position data; grouping the keyframe candidates into clusters by their covisibility with respect to one or more landmarks; matching the local features against with each cluster to obtain a correspondence between the real-time frame image and the keyframe candidates; recovering the camera pose via a perspective-n-point (PnP) scheme with a random sample consensus (RANSAC) method; and refining the camera pose with inlier matches.
- PnP perspective-n-point
- RASAC random sample consensus
- the temporal visible landmarks may be identified with respect to the created keyframe by: projecting the mapped landmarks on to the created keyframe; and filtering the projected landmarks based on one or more visibility criteria to identify one or more projected landmarks which pass the visibility criteria as the one or more temporal visible landmarks with respect to the created keyframe.
- the temporal visible landmarks may be identified with respect to the created keyframe by: dividing the 3D map into a set of voxels to form a voxelized map, wherein each mapped landmark being intersected with one or more voxels; casting a plurality of rays from an optical center of the created keyframe into the voxelized map; locating voxels along each casted ray; and identifying mapped landmarks which are intersected with the located voxels as temporal visible landmarks with respect to the created keyframe.
- Each casted ray may be expressed as [v, t wc ] , where t wc is the position of the optical center under the world coordinate and v is normalized direction vector of the casted ray.
- each voxel in the retrieved map is examined for whether a geometric component (or a Gaussian distribution for a landmark) contained in the voxel can be associated with the local feature.
- a geometric structure parametrized as 3D Gaussian distribution ( ⁇ , ⁇ ) , which can be decomposed using singular value decomposition (SVD) :
- ⁇ RSR T
- S diag ( ⁇ 1 , ⁇ 2 , ⁇ 3 )
- R [e 1 , e 2 , e 3 ]
- the ray parameters [v, t wc ] can be whitened by:
- the Mahalanobious distance between the ray parameters [v, t wc ] can be used for determine whether the geometric component is intersected with the ray.
- the inverse depth can be obtained through:
- the positions of identified temporal visible landmark may be further optimized through an interactive closest point (ICP) scheme including: selecting, for each temporal visible landmark, a set of nearest projected Gaussian distributions as association candidates; optimizing position of the temporal visible landmark by iterating over the association candidates; and selecting a final association candidate which leads minimum reprojection error.
- ICP interactive closest point
- Step S308 the state for the real-time image frame may be estimated by solving a least-squares optimization problem by minimizing a total energy function E total given by:
- E visual , E strucure and E prior are visual term, structure term and prior term of the total energy function respectively.
- E visual is given by:
- ⁇ (. ) is a probability density function
- e proj (x i , ⁇ k ) is the reprojection residual term in projecting a i-th mapped landmark on to a k-th image frame, which is dependent on the predicted camera pose ⁇ k for the k-th image frame and 3D position x i of the i-th mapped landmark
- u ik is a 2-D location of visual feature corresponding to a i-th temporal visible landmark projected in the k-th image frame, and are rotation and translation matrices for projecting the landmark from the world coordinate of the 3D map to the k-th image frame respectively.
- the reprojection residual term e proj may be expressed as:
- E structure may be given by:
- association set for the structure term is a j-th Gaussian distribution for a i-th mapped landmark
- e str is a residual term
- e str_deg is a residual term for a degenerated case, and is a function for indicating whether the i-th mapped landmark is degenerated or not and defined as
- e str For a given likelihood of a i-th landmark, the structure residual term e str is given by:
- ⁇ j and ⁇ j are mean and covariance of the j-th Gaussian distribution respectively.
- x i 3D position of a i-th mapped landmark
- ⁇ j is the mean of the j-th Gaussian distribution
- e j1 is first eigen-base for the j-th Gaussian distribution
- ⁇ str is the pre-defined variance of the geometric structure.
- the prior term E prior may be defined as an initial residual term and given by:
- the optimized landmark position may be formulated as:
- e proj (x i , ⁇ k ) is the projection residual term
- ⁇ k is the pose parameterization of the k-th image frame
- ⁇ ik is the pre-defined variance for the feature corresponding to the i-th mapped landmark.
- the least-squares optimization problem may be solved by the Levenberg-Marquardt method, which gives a system as follow:
- J and r are the stacked Jacobian matrices and residuals, respectively, and W is the weight matrix from stacking the inverse of the covariance for the residual terms, ⁇ is the damping factor for the Levenberg-Marquardt method .
- a batched yet efficient optimization problem may be formulated, which can be solved within 10ms in average.
- the batched optimization problem may be treated as bundle adjustment (BA) which can be decoupled into visual-inertial (or motion-only or structureless) BA and structure-only BA.
- FIG. 4 shows an exemplary visual-inertial BA pipeline 400 according to one embodiment of the present invention.
- FIG. 5 shows a factor graph representation for the visual-inertial BA pipeline.
- the bundle adjustment 400 may include a frame tracking module 401, marginalization module 402, joint optimization module 403 and parameter updating module 404.
- the decoupled scheme accelerates the optimization, iteratively re-linearizing the visual factors can be less effective in computation, while the instant localization can already provide a relatively accurate estimation.
- the visual factors obtained via instant localization may be marginalized as a pose prior instead of using the original visual factors.
- visual factors may be “marginalized out” , which forms a prior term for the pose of the real-time image frame. This eliminates all the visual factors in the optimization, which significantly reduces the computational resources required for visual-inertial bundle adjustment.
- the estimated value and corresponding covariance preserved in the instant localization serve as a prior term in the bundle adjustment.
- the joint optimization adjusts the camera poses along with the inertial parameters, which includes the prior term in the energy function as follows:
- e rot and e trans are rotational and translational terms for the visual-inertial bundle adjustment.
- the camera pose may be optimized via minimizing an objective function:
- F is the keyframes in the current window which modulus
- the problem can then be solved by the Levenberg-Marquarelt method.
- the advantage of this scheme is that there is no need to re-linearize the visual factors, which can be more efficient in computation.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
A geometric structure aided visual localization method and a system for implementing the same is provided. The method comprises: retrieving a 3D map about an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks (S301); obtaining a sequence of real-time image frames by a camera of a mobile system (S302); for each real-time image frame, extracting local features from the real-time image frame (S303); predicting a camera pose corresponding to the real-time image frame (S304); creating a keyframe by tracking the predicted camera pose in the 3D map (S305); identifying temporal visible landmarks with respect to the created keyframe (S306); obtaining 2D-3D correspondences between the local features and the temporal visible landmarks (S307); and locating the mobile system by estimating a state for the real-time image frame based on the 2D-3D correspondences (S308).
Description
The present invention generally relates to visual localization, and more specifically, to a geometric structure aided visual localization method and system.
Localization system plays a significant role in autonomous navigation or virtual/augmented reality. Although there are several existing techniques based on different sensor configurations. Techniques based on global navigation satellite system and inertial navigation system (GNSS/INS) are widely applied for locating autonomous systems. However, they occasionally fail under urban canyons and are entirely disabled for indoor environment. An alternative solution is utilizing three-dimensional (3D) light detection and ranging (LiDAR) device. A 3D LiDAR device scans the scene structure which can be matched against a point cloud map built in advance. However, it is not easy to recover the global location via recognize a place from existing database with LiDAR scans. This is because of the appearance information provided by a single scan is limited. In addition, GNSS/INS or LiDAR-based systems that have the capability to recover centimeter-level sensor poses are typically too expensive to be widely applied for navigation or other applications
Traditional vision-based methods match the query image against the sparse 3D scene model enriched by visual features. Camera pose can then be estimated with three dimensions to two dimensions (3D-2D) correspondences. In contrast to GNSS/INS or LiDAR-based systems, the cost of visual localization systems can be much more inexpensive. However, this comes at a price that visual perception is sensitive to various conditions; and the metric information cannot be directly achieved with vision sensors. These two drawbacks lead common visual localization systems to be less robust and limited the application scenarios
To alleviate the issues in visual localization system, some methods introduce the information from a dense map generated beforehand, which helps constrain the visual structure and thus leads to better pose estimation result. Yet, compared to traditional vision-based pipelines, these systems generally require more computational resources, which can only be fulfilled with a high-standard desktop. The computational requirement limits the large-scale application of these methods, especially on some embedded devices. Therefore, it remains an issue of developing a low-cost localization system while ensures the robustness, accuracy and scalability. It is also desirable to have a visual localization method and system that is computationally efficient so as to have the potential to be applied on edge computing devices.
Summary of the Invention:
According to one aspect of the present invention, a geometric structure aided visual localization method is provided for navigating a mobile system. The method comprises: retrieving a 3D map
about an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks
and obtaining a sequence of real-time image frames by a camera of a mobile system; for each real-time image frame, extracting one or more local features from the real-time image frame; predicting a camera pose corresponding to the real-time image frame; creating a keyframe by tracking the predicted camera pose in the 3D map; identifying one or more temporal visible landmarks with respect to the created keyframe; obtaining one or more two dimensions to three dimensions (2D-3D) correspondences between the local features and the temporal visible landmarks; and locating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D correspondences.
According to another aspect of the present invention, a geometric structure aided visual localization system is provided for navigating a mobile system. The system comprises a camera configured to obtain a sequence of real-time image frames; a first processor including a mapping module configured to retrieve a 3D map
based on an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks
and a second processor including a localization module configured to process each real-time image frame by: extracting one or more local features from the real-time image frame; predicting a camera pose corresponding to the real-time image frame; creating a keyframe by tracking the predicted camera pose in the 3D map; identifying one or more temporal visible landmarks with respect to the created keyframe; obtaining one or more 2D-3D correspondences between the local features and the temporal visible landmarks; and locating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D correspondences.
Preferably, the camera pose is predicted through a visual-inertial odometry process comprising: obtaining real-time inertial measurement data by an inertial measurement unit of the mobile system; pre-integrating the real-time inertial measurement data into a single relative motion constraint; and estimating the camera pose based on the single relative motion constraint.
Preferably, the one or more temporal visible landmarks are identified by: dividing the 3D map into a set of voxels to form a voxelized map, wherein each mapped landmark being intersected with one or more voxels; casting a plurality of rays from an optical center of the created keyframe into the voxelized map; for each casted ray, locating one or more voxels along the casted ray; and identifying one or more mapped landmarks which are intersected with the located voxels as temporal visible landmarks with respect to the created keyframe.
The provided visual localization method and system can take input data from a low-cost sensor suite, such as a visual-inertial sensor, and leverage other modalities, such as high-definition map generated by expensive LiDAR device and integrated navigation system. As such, advantages of both geometric structure and visual structure can be taken without an initial pose for bootstrapping the system. Localization accuracy and the computational efficiency can thus be well-balanced.
Aspects of the present disclosure may be readily understood from the following detailed description with reference to the accompanying figures. The illustrations may not necessarily be drawn to scale. Common reference numerals may be used throughout the drawings and the detailed description to indicate the same or similar components.
FIG. 1 shows a block diagram of a geometric structure aided visual localization system according to one embodiment of the present invention;
FIG. 2 shows exemplary sensor setup of the geometric structure aided visual localization system of FIG. 1;
FIG. 3 shows a flowchart of a localization method according to one embodiment of the present invention;
FIG. 4 shows an exemplary visual-inertial bundle adjustment pipeline according to one embodiment of the present invention; and
FIG. 5 shows a factor graph representation for the visual-inertial bundle adjustment pipeline of FIG. 4.
In the following description, preferred examples of the present disclosure will be set forth as embodiments which are to be regarded as illustrative rather than restrictive. Specific details may be omitted so as not to obscure the present disclosure; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.
FIG. 1 shows a block diagram of a geometric structure aided visual localization system according to one embodiment of the present invention. FIG. 2 shows exemplary sensor setup for the visual localization system of FIG. 1.
As shown, the visual localization system may be fully decoupled into a mapping module 11 and a localization module 12. The mapping module 11 may be operated offline in advance to generate an accurate localizable map, which is expected to be fully reusable for long-term localization. The prior localizable map can be built offline with more sophisticated sensors. To this end, the visual localization system may take input trajectories estimated from an integrated navigation system combining information from laser scanner, camera system and GNSS/INS (in functional block 102: integrated navigation) . In addition, both geometric and visual features are extracted from laser and visual data, respectively (in functional block 104: geometrical/visual feature extraction) . Finally, the geometric and visual features serve as map elements for constructing the localizable map. The features under local frames are transformed with the estimated poses and aggregated to form a localizable map (in functional block 106: map aggregation) .
With the prior localizable map, the localization module 12 can be operated separately. The localization module 12 takes input query image and/or inertial measurement data from a visual-inertial sensor suite. The visual-inertial sensor suite may include a camera system and an inertial measurement unit (IMU) . For each input query image, local features are extracted for representing the current frame (in functional block 108: local feature extraction) . The inertial measurement data from a last frame is pre-integrated for a relative transform and corresponding covariance (at the step 110: pre-integration) to estimate a prior pose. The local features are firstly matched and the last frame is tracked with the prior pose as the initial guess (in functional block 112: inter-frame tracking) . With a relatively accurate pose estimated, the local features are then associated with the map elements obtained from the localizable map to generate some landmarks (in functional block 114: local feature association) . Finally, the optimizable parameters are jointly adjusted for refinement of poses and inertial parameters (in functional block 116: joint optimization) .
As shown in FIG. 2, the mapping module 11 may include a high-standard sensor suite including, but not limited to, laser scanners 22, camera system 24 and GNSS/INSS 26 for obtaining high accuracy and density data to construct the prior localizable and reusable map. The localization module 12 may include an inexpensive visual-inertial (VI) sensor 28 (e.g., VI sensors equipped on a smart phone) to obtain visual data and inertial measurement data such that the cost can be reduced to the maximum extent.
The mapping module 11 and the localization module 12 may be implemented with one or more processors (not shown) which may be CPU, MCU, application specific integrated circuits (ASIC) , field programmable gate arrays (FPGA) or any suitable programmable logic devices configured or programmed to be a processor for performing the mapping and localization. The localization system may further include a memory unit (not shown) which may include a volatile memory unit (such as RAM) , a non-volatile unit (such as ROM, EPROM, EEPROM and flash memory) or both, or any type of media or devices suitable for storing instructions, codes, and/or data.
FIG. 3 shows a flowchart of a visual localization method according to one embodiment of the present invention. As shown, the visual localization method may comprise:
S301: retrieving a 3D map
about an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks
S302: obtaining a sequence of real-time image frames by a camera of a mobile system;
S303: for each real-time image frame, extracting one or more local features from the real-time image frame;
S304: predicting a camera pose corresponding to the real-time image frame;
S305: creating a keyframe by tracking the predicted camera pose in the 3D map;
S306: identifying one or more temporal visible landmarks with respect to the created keyframe;
S307: obtaining one or more 2D-3D correspondences between the local features and the temporal visible landmarks; and
S308: locating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D correspondences.
In Step S301, the visual structure may be constructed by leveraging a simultaneous localization and mapping (SLAM) or structure from motion (SfM) method, where the landmarks are filtered by the number of frames that can observe them. The geometric structure may be constructed using a point cloud representing shapes of landmarks in the 3D map.
The Gaussian mixture model is expressed and parameterized as:
where x
i is 3D position of a i-th mapped landmark,
is a likelihood of the landmark position x
i when a j-th Gaussian distribution
is given; w
j is the weight for the j-th Gaussian distribution for each landmark, μ
j and ∑
j are mean and covariance of the j-th Gaussian distribution respectively.
Each Gaussian distribution of mapped landmark is parameterized as [μ, ∑] with ∑=RSR
T, where S=diag (λ
1, λ
2, λ
3) , λ
1, λ
2, λ
3 are the singular values and have a relationship λ
1<λ
2< λ
3, R= [e
1, e
2, e
3] and e
1, e
2, e
3 are the eigen-bases.
In Step 304, the camera pose may be predicted through a visual-inertial odometry process including: obtaining real-time inertial measurement data by an inertial measurement unit of the mobile system; pre-integrating the real-time inertial measurement data into a single relative motion constraint; and estimating the camera pose based on the single relative motion constraint.
Alternatively, the camera pose may be predicted through a place recognition process including: retrieving a plurality of keyframe candidates with a global descriptor based on the initialization position data; grouping the keyframe candidates into clusters by their covisibility with respect to one or more landmarks; matching the local features against with each cluster to obtain a correspondence between the real-time frame image and the keyframe candidates; recovering the camera pose via a perspective-n-point (PnP) scheme with a random sample consensus (RANSAC) method; and refining the camera pose with inlier matches.
In Step 306, the temporal visible landmarks may be identified with respect to the created keyframe by: projecting the mapped landmarks on to the created keyframe; and filtering the projected landmarks based on one or more visibility criteria to identify one or more projected landmarks which pass the visibility criteria as the one or more temporal visible landmarks with respect to the created keyframe.
Alternatively, the temporal visible landmarks may be identified with respect to the created keyframe by: dividing the 3D map into a set of voxels to form a voxelized map, wherein each mapped landmark being intersected with one or more voxels; casting a plurality of rays from an optical center of the created keyframe into the voxelized map; locating voxels along each casted ray; and identifying mapped landmarks which are intersected with the located voxels as temporal visible landmarks with respect to the created keyframe.
Each casted ray may be expressed as [v, t
wc] , where t
wc is the position of the optical center under the world coordinate and v is normalized direction vector of the casted ray. v may be computed by v=R
wcπ
-1 (u) , where u is the pixel location for each locale feature, π denotes the camera projection function, and R
wc is the rotation matrix from the world coordinate to the camera coordinate. The position of each temporal visible landmark is parameterized as p=t
wc+1/ρ v, where p represents the positions of the located voxels and ρ is the inverse depth of the temporal visible landmark.
After casting the ray to the voxelized map, each voxel in the retrieved map is examined for whether a geometric component (or a Gaussian distribution for a landmark) contained in the voxel can be associated with the local feature. Assume a geometric structure parametrized as 3D Gaussian distribution (μ, Σ) , which can be decomposed using singular value decomposition (SVD) : ∑=RSR
T, S=diag (λ
1, λ
2, λ
3) , R= [e
1, e
2, e
3] , the ray parameters [v, t
wc] can be whitened by:
And the Mahalanobious distance between the ray parameters [v, t
wc] can be used for determine whether the geometric component is intersected with the ray.
Applying the whitening transform, the ray parameters become:
With which the Mahalanobious distance is given by:
If the Mahalanobious distance is within a threshold, then the inverse depth can be obtained through:
Optionally, the positions of identified temporal visible landmark may be further optimized through an interactive closest point (ICP) scheme including: selecting, for each temporal visible landmark, a set of nearest projected Gaussian distributions as association candidates; optimizing position of the temporal visible landmark by iterating over the association candidates; and selecting a final association candidate which leads minimum reprojection error.
In Step S308, the state for the real-time image frame may be estimated by solving a least-squares optimization problem by minimizing a total energy function E
total given by:
E
total =E
visual +E
structure +E
prior
where E
visual, E
strucure and E
prior are visual term, structure term and prior term of the total energy function respectively.
The visual term E
visual is given by:
where ρ (. ) is a probability density function; e
proj (x
i, ξ
k) is the reprojection residual term in projecting a i-th mapped landmark on to a k-th image frame, which is dependent on the predicted camera pose ξ
k for the k-th image frame and 3D position x
i of the i-th mapped landmark; and
is the covariance of the 2D Gaussian distribution for the i-th mapped landmark projected on the k-th image frame,
is the association set for the visual term.
In general, the reprojection residual term e
proj is given by:
where u
ik is a 2-D location of visual feature corresponding to a i-th temporal visible landmark projected in the k-th image frame,
and
are rotation and translation matrices for projecting the landmark from the world coordinate of the 3D map to the k-th image frame respectively.
When the camera pose is predicted through a visual-inertial odometry process, the reprojection residual term e
proj may be expressed as:
where
and
are rotation and translation matrices for transforming from the world coordinate of the 3D map to the body coordinate of the inertial measurement unit with respect to the k-th image frame;
and
are rotation and translation matrices for transforming from the body coordinate of the inertial measurement unit to the k-th image frame.
The structure term E
structure may be given by:
where
is the association set for the structure term,
is a j-th Gaussian distribution for a i-th mapped landmark, e
str is a residual term and e
str_deg is a residual term for a degenerated case, and
is a function for indicating whether the i-th mapped landmark is degenerated or not and defined as
For a given likelihood of a i-th landmark, the structure residual term e
str is given by:
where x
i is 3D position of the i-th mapped landmark, μ
j and ∑
j are mean and covariance of the j-th Gaussian distribution respectively.
The structure residual term for the degenerated case e
str_deg is given by:
where x
i is 3D position of a i-th mapped landmark, μ
j is the mean of the j-th Gaussian distribution; e
j1 is first eigen-base for the j-th Gaussian distribution and
where σ
stris the pre-defined variance of the geometric structure.
The prior term E
prior may be defined as an initial residual term and given by:
Therefore, the optimized landmark position may be formulated as:
where e
proj (x
i, ξ
k) is the projection residual term, ξ
k is the pose parameterization of the k-th image frame and ∑
ik is the pre-defined variance for the feature corresponding to the i-th mapped landmark.
The least-squares optimization problem may be solved by the Levenberg-Marquardt method, which gives a system as follow:
H=J
TWJ+∈I,
b=-J
TWr
where J and r are the stacked Jacobian matrices and residuals, respectively, and W is the weight matrix from stacking the inverse of the covariance for the residual terms, ∈ is the damping factor for the Levenberg-Marquardt method .
In the case that some of the states are not able to be properly estimated, especially when the inertial measurements are incorporated, a batched yet efficient optimization problem may be formulated, which can be solved within 10ms in average. Firstly, the batched optimization problem may be treated as bundle adjustment (BA) which can be decoupled into visual-inertial (or motion-only or structureless) BA and structure-only BA.
FIG. 4 shows an exemplary visual-inertial BA pipeline 400 according to one embodiment of the present invention. FIG. 5 shows a factor graph representation for the visual-inertial BA pipeline. The bundle adjustment 400 may include a frame tracking module 401, marginalization module 402, joint optimization module 403 and parameter updating module 404. Although the decoupled scheme accelerates the optimization, iteratively re-linearizing the visual factors can be less effective in computation, while the instant localization can already provide a relatively accurate estimation. As shown in FIGS. 4 and 5, the visual factors obtained via instant localization may be marginalized as a pose prior instead of using the original visual factors. In other words, after optimizing the state for the real-time image frame, visual factors may be “marginalized out” , which forms a prior term for the pose of the real-time image frame. This eliminates all the visual factors in the optimization, which significantly reduces the computational resources required for visual-inertial bundle adjustment.
More specifically, the estimated value and corresponding covariance preserved in the instant localization serve as a prior term in the bundle adjustment. The joint optimization adjusts the camera poses along with the inertial parameters, which includes the prior term in the energy function as follows:
where e
rot and e
trans are rotational and translational terms for the visual-inertial bundle adjustment.
Accordingly, the camera pose may be optimized via minimizing an objective function:
where F is the keyframes in the current window which modulus | F| may be set to 15.
Assuming the pose estimation reaches the local optimal, the update of state
thus the prior error state can be formulated as:
Then the total objective function may be changed to:
The problem can then be solved by the Levenberg-Marquarelt method. The advantage of this scheme is that there is no need to re-linearize the visual factors, which can be more efficient in computation.
The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. While the methods disclosed herein have been described with reference to particular operations performed in a particular order, it will be understood that these operations may be combined, sub-divided, or re-ordered to form an equivalent method without departing from the teachings of the present disclosure. Accordingly, unless specifically indicated herein, the order and grouping of the operations are not limitations. While the apparatuses disclosed herein have been described with reference to particular structures, shapes, materials, composition of matter and relationships…etc., these descriptions and illustrations are not limiting. Modifications may be made to adapt a particular situation to the objective, spirit and scope of the present disclosure. All such modifications are intended to be within the scope of the claims appended hereto.
Claims (20)
- A geometric structure aided visual localization method, comprising:retrieving a 3D map based on an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarksobtaining a sequence of real-time image frames by a camera of a mobile system;for each real-time image frame:extracting one or more local features from the real-time image frame;predicting a camera pose corresponding to the real-time image frame;creating a keyframe by tracking the predicted camera pose in the 3D map;identifying one or more temporal visible landmarks with respect to the created keyframe;obtaining one or more 2D-3D correspondences between the local features and the temporal visible landmarks; andlocating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D correspondences.
- The geometric structure aided visual localization method according to claim 1, wherein the camera pose is predicted through a place recognition process comprising:retrieving a plurality of keyframe candidates with a global descriptor based on the initialization position data;grouping the keyframe candidates into clusters by their covisibility;matching the local features against with each cluster to obtain a correspondence between the real-time frame image and the keyframe candidates;recovering the camera pose via a perspective-n-point (PnP) scheme with a random sample consensus (RANSAC) method; andrefining the camera pose with inlier matches.
- The geometric structure aided visual localization method according to claim 1, the camera pose is predicted through a visual-inertial odometry process comprising:obtaining real-time inertial measurement data by an inertial measurement unit of the mobile system;pre-integrating the real-time inertial measurement data into a single relative motion constraint; andestimating the camera pose based on the single relative motion constraint.
- The geometric structure aided visual localization method according to claims 2 or 3, wherein the step of identifying the one or more temporal visible landmarks includes:projecting the mapped landmarks on to the created keyframe; andfiltering the projected landmarks based on one or more visibility criteria to identify one or more projected landmarks which pass the visibility criteria as the one or more temporal visible landmarks with respect to the created keyframe.
- The geometric structure aided visual localization method according to claims 2 or 3, wherein the step of identifying the one or more temporal visible landmarks includes:dividing the 3D map into a set of voxels to form a voxelized map, wherein each mapped landmark being intersected with one or more voxels;casting a plurality of rays from an optical center of the created keyframe into the voxelized map;for each casted ray:locating one or more voxels along the casted ray; andidentifying one or more mapped landmarks which are intersected with the locatedvoxels as temporal visible landmarks with respect to the created keyframe.
- The geometric structure aided visual localization method according to claims 4 or 5, wherein the step of estimating the state for the real-time image frame includes solving a least-squares optimization problem and minimizing a total energy function E given by:E total = E visual + E structure + E prior,where E visual, E prior and E proj are visual term, prior term and reprojection term of the total energy function respectively.
- The geometric structure aided visual localization method according to claim 6, positions of the temporal visible landmarks are optimized through an interactive closest point (ICP) scheme.
- The geometric structure aided visual localization method according to claim 7, wherein the ICP scheme includes:selecting, for each temporal visible landmark, a set of nearest projected Gaussian distributions as association candidates; andoptimizing position of the temporal visible landmark by iterating over the association candidates; and selecting a final association candidate which leads minimum reprojection error.
- The geometric structure aided visual localization method according to claim 1, further comprising constructing the 3D map with a point cloud which is generated by a LiDAR.
- The geometric structure aided visual localization method according to claim 1, wherein the initialization position data is obtained through a GNSS/INS unit.
- A geometric structure aided visual localization system, comprising:a camera configured to obtain a sequence of real-time image frames;a first processor including a mapping module configured to retrieve a 3D map based on an initialization position data, the 3D map being constrained with visual structure and geometric structure and modelled by a Gaussian mixture model with a set of Gaussian distributions of mapped landmarks anda second processor including a localization module configured to process each real-time image frame by:extracting one or more local features from the real-time image frame;predicting a camera pose corresponding to the real-time image frame;creating a keyframe by tracking the predicted camera pose in the 3D map;identifying one or more temporal visible landmarks with respect to the created keyframe;obtaining one or more 2D-3D correspondences between the local features and the temporal visible landmarks; andlocating the mobile system by estimating a state for the real-time image frame based on the one or more 2D-3D correspondences.
- The geometric structure aided visual localization system according to claim 11, wherein the localization module is further configured to predict the camera pose corresponding to the real-time image frame by:retrieving a plurality of keyframe candidates with a global descriptor based on the initialization position data;grouping the keyframe candidates into clusters by their covisibility;matching the local features against with each cluster to obtain a correspondence between the real-time frame image and the keyframe candidates;recovering the camera pose via a perspective-n-point (PnP) scheme with a random sample consensus (RANSAC) method; andrefining the camera pose with inlier matches.
- The geometric structure aided visual localization system according to claim 11, further comprising an inertial measurement unit configured to obtain real-time inertial measurement data; andwherein the localization module is further configured to predict the camera pose corresponding to the real-time image frame by:pre-integrating the real-time inertial measurement data into a single relative motion constraint; andestimating the camera pose based on the single relative motion constraint.
- The geometric structure aided visual localization system according to claims 12 or 13, wherein the localization module is further configured to identify the one or more temporal visible landmarks by:projecting the mapped landmarks on to the created keyframe; andfiltering the projected landmarks based on one or more visibility criteria to identify one or more projected landmarks which pass the visibility criteria as the one or more temporal visible landmarks with respect to the created keyframe.
- The geometric structure aided visual localization system according to claims 12 or 13, wherein the localization module is further configured to identify the one or more temporal visible landmarks by:dividing the 3D map into a set of voxels to form a voxelized map, wherein each mapped landmark being intersected with one or more voxels;casting a plurality of rays from an optical center of the created keyframe into the voxelized map;for each casted ray:locating one or more voxels along the casted ray; andidentifying one or more mapped landmarks which are intersected with the located voxels as temporal visible landmarks with respect to the created keyframe.
- The geometric structure aided visual localization system according to claims 14 or 15, wherein the localization module is further configured to estimate the state for the real-time image frame by solving a least-squares optimization problem and minimizing a total energy function E given by:E total = E visual + E structure + E prior,where E visual, E structure and E prior are visual term, structure term and prior term of the total energy function respectively.
- The geometric structure aided visual localization system according to claim 16, wherein the localization module is further configured to optimize positions of the temporal visible landmarks through an interactive closest point (ICP) scheme.
- The geometric structure aided visual localization system according to claim 17, wherein the localization module is further configured to performing the ICP scheme by:selecting, for each temporal visible landmark, a set of nearest projected Gaussian distributions as association candidates; andoptimizing position of the temporal visible landmark by iterating over the association candidates; and selecting a final association candidate which leads minimum reprojection error.
- The geometric structure aided visual localization system according to claim 11, comprising a LiDAR configured to generate a point cloud for constructing the 3D map.
- The geometric structure aided visual localization system according to claim 11, comprising a GNSS/INS configured to obtain the initialization position data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280048273.6A CN117616459A (en) | 2021-07-07 | 2022-07-07 | Geometry-aided visual localization method and system |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163218925P | 2021-07-07 | 2021-07-07 | |
US63/218,925 | 2021-07-07 | ||
US202163227352P | 2021-07-30 | 2021-07-30 | |
US63/227,352 | 2021-07-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023280274A1 true WO2023280274A1 (en) | 2023-01-12 |
Family
ID=84800336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/104407 WO2023280274A1 (en) | 2021-07-07 | 2022-07-07 | Geometric structure aided visual localization method and system |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023280274A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117475092A (en) * | 2023-12-27 | 2024-01-30 | 安徽蔚来智驾科技有限公司 | Pose optimization method, pose optimization equipment, intelligent equipment and medium |
WO2024181903A1 (en) * | 2023-02-28 | 2024-09-06 | Telefonaktiebolaget Lm Ericsson (Publ) | Evaluating and displaying information for improved communication channel |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140333741A1 (en) * | 2013-05-08 | 2014-11-13 | Regents Of The University Of Minnesota | Constrained key frame localization and mapping for vision-aided inertial navigation |
CN108717710A (en) * | 2018-05-18 | 2018-10-30 | 京东方科技集团股份有限公司 | Localization method, apparatus and system under indoor environment |
CN108986037A (en) * | 2018-05-25 | 2018-12-11 | 重庆大学 | Monocular vision odometer localization method and positioning system based on semi-direct method |
CN110782494A (en) * | 2019-10-16 | 2020-02-11 | 北京工业大学 | Visual SLAM method based on point-line fusion |
WO2021004416A1 (en) * | 2019-07-05 | 2021-01-14 | 杭州海康机器人技术有限公司 | Method and apparatus for establishing beacon map on basis of visual beacons |
CN112596064A (en) * | 2020-11-30 | 2021-04-02 | 中科院软件研究所南京软件技术研究院 | Laser and vision integrated indoor robot global positioning method |
-
2022
- 2022-07-07 WO PCT/CN2022/104407 patent/WO2023280274A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140333741A1 (en) * | 2013-05-08 | 2014-11-13 | Regents Of The University Of Minnesota | Constrained key frame localization and mapping for vision-aided inertial navigation |
CN108717710A (en) * | 2018-05-18 | 2018-10-30 | 京东方科技集团股份有限公司 | Localization method, apparatus and system under indoor environment |
CN108986037A (en) * | 2018-05-25 | 2018-12-11 | 重庆大学 | Monocular vision odometer localization method and positioning system based on semi-direct method |
WO2021004416A1 (en) * | 2019-07-05 | 2021-01-14 | 杭州海康机器人技术有限公司 | Method and apparatus for establishing beacon map on basis of visual beacons |
CN110782494A (en) * | 2019-10-16 | 2020-02-11 | 北京工业大学 | Visual SLAM method based on point-line fusion |
CN112596064A (en) * | 2020-11-30 | 2021-04-02 | 中科院软件研究所南京软件技术研究院 | Laser and vision integrated indoor robot global positioning method |
Non-Patent Citations (2)
Title |
---|
YE HAOYANG; HUANG HUAIYANG; HUTTER MARCO; SANDY TIMOTHY; LIU MING: "3D Surfel Map-Aided Visual Relocalization with Learned Descriptors", 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE, 30 May 2021 (2021-05-30), pages 5574 - 5581, XP033989857, DOI: 10.1109/ICRA48506.2021.9561005 * |
YE HAOYANG; HUANG HUAIYANG; LIU MING: "Monocular Direct Sparse Localization in a Prior 3D Surfel Map", 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE, 31 May 2020 (2020-05-31), pages 8892 - 8898, XP033826358, DOI: 10.1109/ICRA40945.2020.9197022 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024181903A1 (en) * | 2023-02-28 | 2024-09-06 | Telefonaktiebolaget Lm Ericsson (Publ) | Evaluating and displaying information for improved communication channel |
CN117475092A (en) * | 2023-12-27 | 2024-01-30 | 安徽蔚来智驾科技有限公司 | Pose optimization method, pose optimization equipment, intelligent equipment and medium |
CN117475092B (en) * | 2023-12-27 | 2024-03-19 | 安徽蔚来智驾科技有限公司 | Pose optimization method, pose optimization equipment, intelligent equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112258618B (en) | Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map | |
US10824862B2 (en) | Three-dimensional object detection for autonomous robotic systems using image proposals | |
Yang et al. | Monocular object and plane slam in structured environments | |
US11030525B2 (en) | Systems and methods for deep localization and segmentation with a 3D semantic map | |
CN111060115B (en) | Visual SLAM method and system based on image edge features | |
CN112602116A (en) | Mapping object instances using video data | |
WO2023280274A1 (en) | Geometric structure aided visual localization method and system | |
US9989969B2 (en) | Visual localization within LIDAR maps | |
US9420265B2 (en) | Tracking poses of 3D camera using points and planes | |
Gao et al. | Ground and aerial meta-data integration for localization and reconstruction: A review | |
Wang et al. | Accurate georegistration of point clouds using geographic data | |
US20200226392A1 (en) | Computer vision-based thin object detection | |
WO2023165817A1 (en) | Semantic slam framework for improved object pose estimation | |
Chen et al. | PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction | |
Li et al. | Indoor layout estimation by 2d lidar and camera fusion | |
Lin et al. | Dense 3D surface reconstruction of large-scale streetscape from vehicle-borne imagery and LiDAR | |
Liu et al. | Accurate real-time visual SLAM combining building models and GPS for mobile robot | |
Pinto et al. | A mosaicking approach for visual mapping of large-scale environments | |
Velasco-Sánchez et al. | LiLO: Lightweight and low-bias LiDAR odometry method based on spherical range image filtering | |
CN117152228A (en) | Self-supervision image depth estimation method based on channel self-attention mechanism | |
Pöschmann et al. | Synthesized semantic views for mobile robot localization | |
CN115880690A (en) | Method for quickly marking object in point cloud under assistance of three-dimensional reconstruction | |
Chen et al. | 360ORB-SLAM: A visual SLAM system for panoramic images with depth completion network | |
Eudes et al. | Weighted local bundle adjustment and application to odometry and visual slam fusion | |
CN117616459A (en) | Geometry-aided visual localization method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22837019 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280048273.6 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22837019 Country of ref document: EP Kind code of ref document: A1 |