WO2012166814A1 - Cartographie d'environnement en ligne - Google Patents

Cartographie d'environnement en ligne Download PDF

Info

Publication number
WO2012166814A1
WO2012166814A1 PCT/US2012/040028 US2012040028W WO2012166814A1 WO 2012166814 A1 WO2012166814 A1 WO 2012166814A1 US 2012040028 W US2012040028 W US 2012040028W WO 2012166814 A1 WO2012166814 A1 WO 2012166814A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyframes
environment map
keyframe
video sequence
video
Prior art date
Application number
PCT/US2012/040028
Other languages
English (en)
Inventor
Jongwoo Lim
Jan-Michael Frahm
Marc Pollefeys
Original Assignee
Honda Motor Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honda Motor Co., Ltd. filed Critical Honda Motor Co., Ltd.
Publication of WO2012166814A1 publication Critical patent/WO2012166814A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20072Graph-based image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the invention relates generally to vision-based mapping and in particular to realtime metric map reconstruction of the environment visited by a navigation device using a hybrid representation of a fully metric Euclidean environment map and a topological map.
  • LIDAR laser range finders
  • SLAM simultaneous localization and mapping
  • Topological mapping can be used for online computation in large scale nvironm nts.
  • Topological mapping represents the environment as a graph with a set of places (nodes) and the relative location information between the places (edges). In this representa tion, a loop closure does not require any additional error adjustment. However, in return, it loses the global metric property. For example, a robot cannot perform spatial reasoning for proximity unless the link between the map locations is present in the topological map.
  • Figure I illustrates a computer system, for online environment mapping according to one embodiment of the invention.
  • Figure 2 is a system level flowchart of generating a map of the environment contained in an input video stream in real-time according to one embodiment of the invention.
  • Figure 3 illustrates examples of a keyframe pose graph of the environment contained in an input video stream and its corresponding environment map embedding of the keyframes with associated landmarks according to one embodiment of the invention.
  • Figure 4A illustrates an example of a keyframe pose graph with local adjustment according to one embodiment of the invention.
  • Figure 4B- illustrates an example of a keyframe pose graph with detected loop closures and local adjustment to- the keyframe pose graph according to one embodiment of the invention.
  • Figure 5 is an example of global adjustment procedure of an en vironment map according to one embodiment of the invention.
  • Figure 6 is an example computer system for online environment mapping using a hybrid representation of a metric Euclidean environment map and a. topological map according to one embodiment of the invention.
  • the invention also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a non-transitory computer readable storage medium, such" as, but is not limited to, any type of disk including floppy disks, optical disks.
  • the computers referred to in the specification may include a single processor or may be architectures employing, multiple processor designs for increased computing capability.
  • Embodiments of the invention provide a solution to the online mapping of large- scale environments using a hybrid representation of a metric Euclidean environment map and a topological map.
  • One embodiment of a disclosed system includes a scene flow module, a location recognition module, a local adjustment .module and a global adjustment module.
  • the scene flow module is for detecting and tracking video features of the frames of an input video sequence.
  • the scene flow module is also configured to identify multiple keyframes of the input video sequence and add the identified keyframes into an initial environment map of the input video sequence.
  • the environment map is represented by a keyframe pose graph comprising the. keyframes as the nodes ' of the graph and neighboring keyframes are connected by edges in the graph.
  • the location recognition module is for detecting loop closures in the environment map.
  • the local adjustment module enforces local metric properties of the keyframes in the environment map, and the global adjustment module is for optimizing the entire environment map subject to global metric properties of the keyframes in the keyframe pose graph.
  • a solution to the online mapping of large-scale environments described above is an improved SLAM system that uses a hybrid representation of a metric Euclidean environment map and a topological map.
  • the system models inter-sub-map relationships of the online mapping through topological transformations by effectively summarizing the mapping constraints into the transformation.
  • the system reduces the number of cameras by selecting keyframes and using segments of keyframes of an Input video stream via global adjustment.
  • the system encodes the constraints between sub-maps through the topological transformations between the sub-maps. Comparing with existing methods (e.g., FranieSLAM method), ' the system does not depend on the linearization of the projection function to create the marginalized constraints, and does not suffer under the inaccuracy in the linearization.
  • the improved SLAM system achieves a globally metric solution while maintaining the efficiency during the processing of the sub-maps by optimizing the constraints of adj acent sub-maps through iterative optimization of the non-linear -constraints.
  • the adjacent -sub-maps have a topological transformation defined in between them.
  • Figure 1 illustrates a computer system 100 that implement an improved SLAM system according to one embodiment of the invention.
  • the computer system 100 deploys a hybrid mapping method by combining the benefits of metric Euclidean maps and topological maps, namely the locally-metric globally topological mapping, to generate the environment map of an input video stream.
  • the environment map is represented as a graph of the keyframes (nodes) of the input video stream and the relative pose between keyframes (edges), like the topological .approach.
  • the main distinction to existing approaches is that the computer system 100 enforces the metric properties by enforcing the locally metric property all the time via local adj ustment, and global
  • the computer system 100 comprises a memory 120, an input controller 130. an output controller 140, a processor 150 and a data store 160.
  • the computer system 100 is confi gured to receive an input video 1.10 for building up a map of the environment contained in the input video 110.
  • the map of the environment contained in the input video 1 10 is . referred to as "environment map of the input video" from herein.
  • the environment map can be used to determine a location within the environment and to depict the environment for planning and navigation by robots and/or autonomous vehicles.
  • the input video 110 comprises multiple video frames with various motion characteristics.
  • the input videos 1 10 are captured by two stereo rigs, one of which with 7.5cm baseline and 110° horizontal field of view is mounted in the head of a humanoid robot.
  • the second stereo rig has a baseline of 16 cm and a 95° horizontal field of view and is mounted at the front of an electric cart.
  • the effective image resolution after rectification is 640 x 360 pixels.
  • the input videos are recorded at 12-45 frame per second (fps).
  • An example input video 110 is a video sequence taken by a humanoid robot walking in a large building. There i a corridor with, very few features in which case the motion estimation becomes inaccurate. Due to the robot's motion characteristics, the camera experiences shaking and vibrations.
  • the proposed online environment mapping demonstrates the efficiency of feature tracking and accuracy of global geometry of the mapping performed by the computer system 100.
  • Another two example input videos 110 are an indoor video sequence and an outdoor video sequence taken from a moving electric cart.
  • the cart-outdoor sequence contains a very long travel around a building, and the accumulated motion estimation error is corrected when loops are detected.
  • the cart-indoor sequence has a depth range of the tracked features ranging from very close to far, and contains a significant number of loops.
  • the proposed solution shows that it keeps the local geometry in the map to be metric and correct whereas the global metric properly is improved as global adjustment progresses.
  • the memory 120 stores data and/or instructions that may be executed by the processor 150.
  • the instructions may comprise computer program code for performing any and/or ail of the techniques described herein.
  • the memory 120 may be a DRAM device, a static random access -memory (SRAM), Flash RAM (non-volatile storage), combinations of the above, or some other memory device known in the art.
  • the memory 120 comprises a scene flow module 122, a location recognition module 124, a local adjustment module 126 and a global adjustment module 128. Other embodiments may contain different functional modules and different number of modules.
  • the scene flow module 122 is configured to detect and extract spatio-temporal salient features of the input video 1 10, Salient features of a video sequence can localize the motion events of the video sequence.
  • the scene flow module 122 is further configured to find inlier features among the tracked features and initial three-dimensional (3D) camera pose, Inlier features of the input video 1 10 are the features whose distribution can be modeled by some set of model parameters used to track the salient features of the input video 1 10.
  • the scene flow module 122 is further configured to create keyframes based on the optimized inlier features and generate an initial environment map represented by a keyframe pose graph.
  • the keyframe pose graph is described in details below.
  • the location recognition-module 124 is configured to find possible loop closures in the environment map.
  • a loop closure occurs when the computer system 100 revisits a previously-captured location.
  • the location recognition module 124 uses feature descriptors computed on the tracked feature points to find the loop closures.
  • Candidate keyframes of the input video 110 are selected based on. the feature descriptors, where a candidate keyframe with the number of inliers above a given threshold value is chosen as the location recognition result and is added to the keyframe pose graph.
  • the local adjustment module 126 is configured to perform a windowed bundle adjustment of the recently added keyframes of the input video 110.
  • the global adjustment module 128 is configured to optimize the keyframe pose graph and generate an optimized environment map of the input video .1 10. The local adjustment and global optimization processes are further described below.
  • Figure 2 is a system level flowchart of generating a map of the en vironment of an input vi deo stream in real-time according to one embodiment of the invention.
  • the computer system 100 receives 210 an input video from a camera system and detects and tracks 212 the video features (e.g., the- salient ' motion events) in the input video.
  • the computer system 100 computes 214 the motion estimation of the camera system (e.g., the initial 3D camera pose) and adds 216 identified keyframes, of the input video into an initial environment map represented by a keyframe pose graph.
  • the computer system 100 detects 218 the loop closures in the environment map and enhances 220 the local geometry around recently added keyframes of the environment, map.
  • the computer system 100 further optimizes 222 the environment map with global metric properties and generates 224 an optimized environment map of 3D points and landmarks detected in the input video.
  • the computer system 100 stores 226 the optimized environment map for various online environment mapping applications.
  • the environment map is represented as a keyframe pose graph, whose nodes of the graph are the keyframes of an. input video and edges represent the relative pose between two keyframes. More precisely, an edge a -» b ; P gb represents a link from node a to node b with the associated 3D Euclidean transformation P b , where P ab is a
  • the keyframe pose graph 300- A shows the topological structure of an example environment of an input video.
  • the keyframe pose graph shown i 300-A includes multiple keyframes (e.g.,. keyframe 302a and keyframe 302b) and landmarks associated with the keyframes (e.g., landmark 304 associated with the keyframe 302a).
  • the keyframes are connected (e.g., the link between the keyframes 302a and 302b).
  • a landmark is associated with one or more keyframes (e.g., landmark 304 associated with keyframe 302a) and the association is represented by a link between the landmark and each of the associated keyframes (e.g., the link 306 between the landmark 304 and keyframe 302a).
  • the environment map is incrementally constructed as the camera moves to capture the input video sequence.
  • Most keyframes are linked to the previous keyframes ' via commonly observed landmarks (e.g., the link between keyframes 302a and 302b).
  • the location -recognition module 124 finds additional links between keyframes, which create loops in the keyframe pose graph ' (e.g., the dashed line 308 shown in Figure 300- A).
  • the landmarks are attached to an anchor keyframe, where- the landmarks are first observed (e.g., the link 306 between the landmark 304 and the keyframe 302a).
  • an anchor keyframe for a landmark is selected randomly from the keyframes that observed the landmark.
  • Each landmark's position in the nvironment niap is represented as a homogeneous 4-veetor x in the anchor keyframe's coordinate system.
  • the metric property of the environment map is embedded into the keyframe pose graph.
  • the metric embedding of the keyframe pose graph is constructed as follows in Table I:
  • Step.3 for each landmark / " and its anchor keyframe c, ⁇
  • P a denotes the pose of a keyframe a in the embedded space.
  • j j admire denotes the norm of the translation component in P, and d in (d, a) is the geodesic distance from a n to a on the keyframe pose graph.
  • the geodesic distance from 0 to on the keyframe pose graph represents the number oi " edges between a 0 to a on the keyframe pose graph in a shortest path connecting them.
  • the metric embedding procedure illustrated above performs weighted breadth first search of the keyframe pose graph from a reference keyframe and embeds the keyframes according to the order of the breadth first search.
  • the landmarks are embedded using their anchor keyframes' embedded pose.
  • Figure 300-B shows an example of the embedded keyframe pose graph corresponding to the keyframe pose graph in Figure 300- A, where 310 is an example of a metric embedded keyframe and 312 is a landmark. It is n oted that the embedded maps of a keyframe pose graph may be different depending on the choice of the reference keyframes, and there is no guar antee that a loop in an initial map represented by the keyframe. pose graph remains as a valid loop in the embedded map. If there is metric inconsistency in a loop (e.g., when ine combined transformation along a loop is not an identity transformation), the accumulated error will break the farthest l ink from the reference keyframe .
  • the proposed computer system 100 improves the artifact in topological mapping by enforcing the metric property through local and global adjustment.
  • the hybrid approach implemented by the computer system 100 is able to maintain the benefit of topological map (e.g., instant loop closure), whereas the map is enforced to be metrically correct after local and global adjustment.
  • a new keyframe is selected if it provides the majority of changes in the environment map with its nearby keyframes that have commonly visible landmarks, in one embodiment, the change is computed through the local, adjustment module 126, which, improves the estimated motion of a current keyframe, and ensures a locally metric map around the current keyframe ' s location.
  • the local adjustment module 126 is configured to resolve the observed inconsistencies in time, at least locally.
  • the local adjustment module 1 26 updates t e links to active keyframes and the ' positions of active landmarks.
  • the most recent-w keyframes are initially selected as the active keyframes, where w is a. window size parameter, typically 5-10. If there are links between one or more additional keyframes and the initial active keyframes, the additional keyframes are also added to the active keyframe set.
  • the size of the active keyframe set is bounded because the number of active keyframes is al most twice the window size w due to the fact that the location recognition adds no more than one additional link per keyframe.
  • ail landmarks visible from the active keyframes are used in the optimization as the active landmarks. All other keyframes which have the observations of the -active landmarks are included as fixed keyframes that can use all available
  • fable II illustrates one embodiment of local adjustment for the new keyframes.
  • the local adjustment module 126 uses a standard sparse bundle adjustment algorithm in the embedded metric spac for optimization, After apply the Scliur complement for local adjustment optimization, the largest linear system to be solved by the local adjustment module 126 has at most 12 x w variables, where w is a window size parameter.
  • the number of landmarks and fixed keyframes affects the performance through the increased number of observations, but in a usual setup, the local adjustment runs efficiently.
  • the explicit topological structure is not used anymore, but it still remains in the observations that associate keyframes and landmarks.
  • the same topological structure is used implicitly through the Jacobian matrices for keyframes and landmarks.
  • the optimized keyframes and landmarks are imported back into the environment map represented by the keyframe pose graph with local adjustment.
  • Figure 4A illustrates an example of keyframe pose graphs with local adjustment according to one embodiment of the invention.
  • the left part of Figure 4A shows a keyframe pose graph having, a window 412 of 5 active keyframes and 5 fixed keyframes (418) of an initial environment map without local adjustment.
  • the fixed keyframes are keyframes outside the current processing window but with observations of the active landmarks.
  • the fixed keyframes can use all available observations of landmarks in the local adjustment.
  • the local adjustment module 126 performs a sparse bundle adjustment algorithm on the initial environment map to enforce metric constraints locally.
  • the right side of Figure 4A shows the environment map with local adjustment, which updates the links to the active keyframes and the positions of the active landmarks.
  • 402a, 402b and 402c are active landmarks observed by the active keyframes of the current processing window.
  • the active landmark 402b is observed by keyframes 404a, 404b and 404c.
  • the association between the active landmark 402b and the keyframes 404a, 404b and 404c are represented by the links 406a, 406b and 406c.
  • Some keyframes and the landmarks e.g., keyframe 414 and landmark 416) are not used in the local optimization.
  • Figure 4B illustrates an example of a keyframe pose graph with detected loop closures (e.g., loop 408) and its corresponding keyframe pose graph with local adjustment.
  • the local adjustment described above guarantees that the environment map is locally metric, but still the entire map may not be metric due to the errors along the detected loop closures.
  • Achieving global metric consistency i.e., the relative rotation and ' translation among multiple observations of a scene
  • One solution is to embed the entire map into the metric space, optimize the embedded structure, and update the result back into the topological map. This is fundamentally identical to the local adjustment step described above, but when a large number of keyframes and landmarks exist, this may take significant computation time and may have- difficulty in converging to the right map.
  • the global adjustment module 128 is configured to use a novel divide-and-eoncjuer strategy to efficiently solve the global adjustment problem.
  • a disjoin t set of keyframes is a segmen t which uses geodesic distance on the keyframe pose graph.
  • the global adjustment module 128 iterates local segment-wise optimization and global segment optimization as follows:
  • segments are treated as rigid bodies in embedding and optimization in Step 3.
  • s denotes a segment-wise six degree of freedom 3D rigid motion, and the projected coor to keyframe / in segment k is
  • the proposed global adjustment has several advantages over existing methods. For example, using existing nested dissectio with boundary variables has a. serious problem of most of variables being boundary when the graph is not very sparse and segmentation is fine. Long tracks of features induce dependencies among all keyframes that observe common landmarks, and the sparsity is significantly reduced. The proposed solution does not have this issue since it treats each segment as a virtual camera, -so the size of global optimization does not depend on the sparsity of the environment map.
  • Figure 5 is an example of global adjustment procedure of an environment map according to one embodiment of the invention.
  • the keyframe pose graph 510 is partitioned into 3 keyframes segments, 522, 524 and 526 and the segments are embedded into a metric space 520.
  • Each segment of keyframes is optimized by a local adj ustment algorithm if necessary.
  • the global adjustment module 128 adjusts the segments' poses and landmarks '1 positions assuming that the segments are moving rigidly. For example, segment 522 is adjusted within the segment (e.g., segment adjustment 532). Similarly, segment 524 and segment 526 are adjusted within their respective segments (e.g., segment adjustment 534 and segment adjustment 536).
  • the lines in each segment adjustment show the association of the landmark and keyframes that commonly observe the landmark.
  • global adjustment module 1.28 generates an update environment map represented by the globally adjusted keyframe pose graph 540.
  • Figure 6 is an example computer system 600 for online environment mapping using a hybrid representation of a metric Euclidean environment map and a topological map on a robot platform according to one embodiment of the invention.
  • the system 600 takes an input video 602 and generates an environment, map of sparse 3D point landmarks 612 contained in the input video 602. in one embodiment, the input video 602 is a calibrated stereo video stream captured by a camera system with a pair of stereo cameras.
  • the system 600 has four major components: scene flow modul 604, location recognition module 606, local adjustment module 608 and global adjustment module 610.
  • Ail four components 602, 604, 606 and 608 are executed in parallel to minimize latency and to maximize performance throughput.
  • Processing parameters such as video frame feature descriptors and keyframe identifications are propagated between modules using standard message passing mechanisms (e.g.. remote procedure calls).
  • the scene flow module 602 is responsible for detecting and tracking salient features in the input video stream 602, finding inlier features among the tracked features and computing the initial six-degree of freedom motion estimates of the camera system.
  • the six- degree of freedom motion estimates of (he camera system constitute the pose estimates of the camera system.
  • the scene flow module 604 processes each video frame of the input video 602 by detecting and tracking salient features in the video frame, finding inlier features among the tracked features and computing the motion estimates of the camera system with respect to the video frame being processed. By processing each video frame, the robot has the pose estimates of the camera system a all times during the environment map generation.
  • the scene flow module 604 uses a comer detector that is limited to detection on edges for feature detection step. Using the comer detector ensures the feature placement to be on the true corners in the scene.
  • the scene flow module 604 tracks the detected features by two 2D Kanade-Lucas-T omas i (KLT) feature trackers on the left and right camera's video streams separately.
  • KLT 2D Kanade-Lucas-T omas i
  • correspondences of the features are established using normalized sum of squared differences (SSD) of the initially detected features.
  • SSD normalized sum of squared differences
  • the scene flow module 604 computes the initial 3D position of a landmark using the disparity from the stereo feature match (e.g., distances among observed key points of the stereo feature). As the camera moves, the local adj ustment module 608 and the global adjustment module 610 update the landmark position using all available observations from different viewpoints of a scene.
  • the scene flow module 604 employ a 3-point. algorithm embedded in a random sample consensus (RANSAC) procedur for robust motion estimation and outlier rejection.
  • RANSAC finds the initial 3D camera pose and inlier features, the 3D pose is enhanced with a non-linear optimization using all inlier features, and a new set of inliers is found with the enhanced pose estimate.
  • a keyframe is created and added to the map by the scene flow module 604.
  • the new keyframe is passed to the location recognition module 606 and the local adjustment module 608 for further processing.
  • Newly established features are added as new landmarks, and the landmarks with too few
  • the location, recognition module 606 is responsible for finding possible loop closures in the environment map. A loop closure is detected if the system 600 is revisiting a previously-captured location.
  • the location recognition uses an upright version of speeded-up robust features (USURF-64) descriptor computed on the tracked feature points for location recognition. This is possible because the scale of each feature can be computed from the inverse of the depth of the feature, The advantages are increased performance by saving the interest point detection and better stability in determining the feature's scale.
  • the descriptors are computed using integral image techniques and the descriptors are attached to the landmark observation.
  • the vocabulary tress is trained off-line from millions descriptors from various indoor and outdoor training videos.
  • the location recognition module 606 performs the relative pose estimation using RANSAC with the 3-point algorithm similarly by the scene flow module 604. and the candidate with most inliers (e.g. , above a given threshold) is chosen as the location recognition result.
  • the obtained pose estimate and the inlier set are improved via a non-linear optimization, if a match is successfully found, a new link connecting the current keyframe to the detected keyframe is added into the keyframe pose graph 612.
  • the local adjustment module 608 performs a windo wed bundl adjustment of the recently added keyframes as described abo ve in the Section "Local Adjustment.”
  • the local adjustment module 608 uses the standard sparse bundle. adjustment algorithm with pseudo Huber norm to perform local adjustment,
  • the global adjustment module ' performs the optimization of the entire
  • T make global adjustment use as many keyframes as possible, in one
  • the global optimization iterates only once and new segmentation is found using all available keyframes including newly added keyframes after the previous global optimization.
  • Embodimen ts of the in vention pro vide a solution to online environment mapp ing by using a hybrid representation of a fully metric Euclidean environment map and a topological map of an input video sequence.
  • the experiment results show that the proposed local adjustment can handle the topological changes of the input video sequence successfully.
  • the topological changes are reflected in the optimization, where the loop closure creates additional constraints among keyframes.
  • the resulting map is only locally metric. Severe misalignments may even prevent traditional bundle adjustment from converging to the right map.
  • the ' roposed global adjustment overcomes the deficiencies of existin methods. For each iteration, the global adjustment segments the environment map into several keyframe segments, and individual segments are optimized locally. The global adjustment -aligns the optimized segments jointly with all the landmarks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention porte sur un système et un procédé de cartographie en ligne d'environnements à grande échelle utilisant une représentation hybride d'une carte d'environnement euclidienne métrique et d'une carte topologique. Le système comprend un module de flux de scène, un module de reconnaissance d'emplacement, un module d'ajustement local et un module d'ajustement global. Le module de flux de scène sert à détecter et suivre des caractéristiques vidéo des images d'une séquence vidéo d'entrée. Le module de flux de scène est également configuré pour identifier de multiples images clés de la séquence vidéo d'entrée et ajouter les images clés identifiées à une carte d'environnement initiale de la séquence vidéo d'entrée. Le module de reconnaissance d'emplacement sert à détecter des fermetures de boucle dans la carte d'environnement. Le module d'ajustement local applique des propriétés métriques locales des images clés dans la carte d'environnement, et le module d'ajustement global sert à optimiser la carte d'environnement entière soumise à des propriétés métriques globales des images clés dans le graphe de pose d'image clé.
PCT/US2012/040028 2011-05-31 2012-05-30 Cartographie d'environnement en ligne WO2012166814A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161491793P 2011-05-31 2011-05-31
US61/491,793 2011-05-31

Publications (1)

Publication Number Publication Date
WO2012166814A1 true WO2012166814A1 (fr) 2012-12-06

Family

ID=47259829

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/040028 WO2012166814A1 (fr) 2011-05-31 2012-05-30 Cartographie d'environnement en ligne

Country Status (2)

Country Link
US (1) US8913055B2 (fr)
WO (1) WO2012166814A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015050773A1 (fr) * 2013-10-04 2015-04-09 Qualcomm Incorporated Suivi d'objet basé sur des données de carte environnementale construites de façon dynamique

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8751151B2 (en) * 2012-06-12 2014-06-10 Trx Systems, Inc. System and method for localizing a trackee at a location and mapping the location using inertial sensor information
JP6044079B2 (ja) * 2012-02-06 2016-12-14 ソニー株式会社 情報処理装置、情報処理方法及びプログラム
US9420265B2 (en) * 2012-06-29 2016-08-16 Mitsubishi Electric Research Laboratories, Inc. Tracking poses of 3D camera using points and planes
US10262462B2 (en) 2014-04-18 2019-04-16 Magic Leap, Inc. Systems and methods for augmented and virtual reality
US9952042B2 (en) 2013-07-12 2018-04-24 Magic Leap, Inc. Method and system for identifying a user location
US9036044B1 (en) * 2013-07-22 2015-05-19 Google Inc. Adjusting camera parameters associated with a plurality of images
US9811731B2 (en) 2013-10-04 2017-11-07 Qualcomm Incorporated Dynamic extension of map data for object detection and tracking
DE102014002821A1 (de) * 2014-02-26 2015-08-27 Audi Ag Verfahren und ein System zum Lokalisieren einer mobilen Einrichtung
US20150330054A1 (en) * 2014-05-16 2015-11-19 Topcon Positioning Systems, Inc. Optical Sensing a Distance from a Range Sensing Apparatus and Method
US9741140B2 (en) * 2014-05-19 2017-08-22 Microsoft Technology Licensing, Llc Fast solving for loop closure using a relative state space
US9865061B2 (en) * 2014-06-19 2018-01-09 Tata Consultancy Services Limited Constructing a 3D structure
US10373335B1 (en) * 2014-07-10 2019-08-06 Hrl Laboratories, Llc System and method for location recognition and learning utilizing convolutional neural networks for robotic exploration
US10518879B1 (en) * 2014-07-10 2019-12-31 Hrl Laboratories, Llc System and method for drift-free global trajectory estimation of a mobile platform
US9483879B2 (en) 2014-09-18 2016-11-01 Microsoft Technology Licensing, Llc Using free-form deformations in surface reconstruction
US9472009B2 (en) 2015-01-13 2016-10-18 International Business Machines Corporation Display of context based animated content in electronic map
US10360718B2 (en) 2015-08-14 2019-07-23 Samsung Electronics Co., Ltd. Method and apparatus for constructing three dimensional model of object
JP6775969B2 (ja) * 2016-02-29 2020-10-28 キヤノン株式会社 情報処理装置、情報処理方法、及びプログラム
US10764561B1 (en) 2016-04-04 2020-09-01 Compound Eye Inc Passive stereo depth sensing
CN109478330B (zh) * 2016-06-24 2022-03-29 罗伯特·博世有限公司 基于rgb-d照相机的跟踪系统及其方法
CN106352877B (zh) * 2016-08-10 2019-08-23 纳恩博(北京)科技有限公司 一种移动装置及其定位方法
US10217221B2 (en) 2016-09-29 2019-02-26 Intel Corporation Place recognition algorithm
CN106803271B (zh) * 2016-12-23 2020-04-28 成都通甲优博科技有限责任公司 一种视觉导航无人机的摄像机标定方法及装置
CN106952291B (zh) * 2017-03-14 2020-07-14 哈尔滨工程大学 基于3维结构张量各向异性流驱动的场景流车流量统计与测速方法
US10990829B2 (en) * 2017-04-28 2021-04-27 Micro Focus Llc Stitching maps generated using simultaneous localization and mapping
GB201718507D0 (en) 2017-07-31 2017-12-27 Univ Oxford Innovation Ltd A method of constructing a model of the motion of a mobile device and related systems
CN108986136B (zh) * 2018-07-23 2020-07-24 南昌航空大学 一种基于语义分割的双目场景流确定方法及系统
US10991117B2 (en) * 2018-12-23 2021-04-27 Samsung Electronics Co., Ltd. Performing a loop closure detection
US11960297B2 (en) * 2019-05-03 2024-04-16 Lg Electronics Inc. Robot generating map based on multi sensors and artificial intelligence and moving based on map
KR20210029586A (ko) * 2019-09-06 2021-03-16 엘지전자 주식회사 이미지 내의 특징적 객체에 기반하여 슬램을 수행하는 방법 및 이를 구현하는 로봇과 클라우드 서버
WO2021108626A1 (fr) 2019-11-27 2021-06-03 Compound Eye Inc. Système et procédé de détermination de carte de correspondance
US11069071B1 (en) * 2020-01-21 2021-07-20 Compound Eye, Inc. System and method for egomotion estimation
WO2021150784A1 (fr) 2020-01-21 2021-07-29 Compound Eye Inc. Système et procédé d'étalonnage de caméra
CN112362072B (zh) * 2020-11-17 2023-11-14 西安恒图智源信息科技有限责任公司 一种复杂城区环境中的高精度点云地图创建系统及方法
US20220172386A1 (en) * 2020-11-27 2022-06-02 Samsung Electronics Co., Ltd. Method and device for simultaneous localization and mapping (slam)
US20220287530A1 (en) * 2021-03-15 2022-09-15 Midea Group Co., Ltd. Method and Apparatus for Localizing Mobile Robot in Environment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040167670A1 (en) * 2002-12-17 2004-08-26 Goncalves Luis Filipe Domingues Systems and methods for computing a relative pose for global localization in a visual simultaneous localization and mapping system
US20050025343A1 (en) * 2001-06-18 2005-02-03 Microsoft Corporation Incremental motion estimation through local bundle adjustment
US20060088203A1 (en) * 2004-07-14 2006-04-27 Braintech Canada, Inc. Method and apparatus for machine-vision
US7714895B2 (en) * 2002-12-30 2010-05-11 Abb Research Ltd. Interactive and shared augmented reality system and method having local and remote access
US20100232727A1 (en) * 2007-05-22 2010-09-16 Metaio Gmbh Camera pose estimation apparatus and method for augmented reality imaging

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3970520B2 (ja) * 1998-04-13 2007-09-05 アイマティック・インターフェイシズ・インコーポレイテッド 人間の姿を与えたものを動画化するためのウェーブレットに基づく顔の動きの捕捉
US6487304B1 (en) * 1999-06-16 2002-11-26 Microsoft Corporation Multi-view approach to motion and stereo
US6952212B2 (en) * 2000-03-24 2005-10-04 Telefonaktiebolaget Lm Ericsson (Publ) Frame decimation for structure from motion
US7689321B2 (en) 2004-02-13 2010-03-30 Evolution Robotics, Inc. Robust sensor fusion for mapping and localization in a simultaneous localization and mapping (SLAM) system
US7831433B1 (en) * 2005-02-03 2010-11-09 Hrl Laboratories, Llc System and method for using context in navigation dialog
US7996771B2 (en) * 2005-06-17 2011-08-09 Fuji Xerox Co., Ltd. Methods and interfaces for event timeline and logs of video streams
US20070030396A1 (en) * 2005-08-05 2007-02-08 Hui Zhou Method and apparatus for generating a panorama from a sequence of video frames
US8305430B2 (en) 2005-09-16 2012-11-06 Sri International System and method for multi-camera visual odometry
US20090010507A1 (en) * 2007-07-02 2009-01-08 Zheng Jason Geng System and method for generating a 3d model of anatomical structure using a plurality of 2d images
AU2011305154B2 (en) * 2010-09-24 2015-02-05 Irobot Corporation Systems and methods for VSLAM optimization
US9020187B2 (en) * 2011-05-27 2015-04-28 Qualcomm Incorporated Planar mapping and tracking for mobile devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050025343A1 (en) * 2001-06-18 2005-02-03 Microsoft Corporation Incremental motion estimation through local bundle adjustment
US20040167670A1 (en) * 2002-12-17 2004-08-26 Goncalves Luis Filipe Domingues Systems and methods for computing a relative pose for global localization in a visual simultaneous localization and mapping system
US7714895B2 (en) * 2002-12-30 2010-05-11 Abb Research Ltd. Interactive and shared augmented reality system and method having local and remote access
US20060088203A1 (en) * 2004-07-14 2006-04-27 Braintech Canada, Inc. Method and apparatus for machine-vision
US20100232727A1 (en) * 2007-05-22 2010-09-16 Metaio Gmbh Camera pose estimation apparatus and method for augmented reality imaging

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015050773A1 (fr) * 2013-10-04 2015-04-09 Qualcomm Incorporated Suivi d'objet basé sur des données de carte environnementale construites de façon dynamique
CN105593877A (zh) * 2013-10-04 2016-05-18 高通股份有限公司 基于动态地构建的环境地图数据进行物体追踪
US9524434B2 (en) 2013-10-04 2016-12-20 Qualcomm Incorporated Object tracking based on dynamically built environment map data
CN105593877B (zh) * 2013-10-04 2018-12-18 高通股份有限公司 基于动态地构建的环境地图数据进行物体追踪

Also Published As

Publication number Publication date
US20120306847A1 (en) 2012-12-06
US8913055B2 (en) 2014-12-16

Similar Documents

Publication Publication Date Title
US8913055B2 (en) Online environment mapping
Lim et al. Online environment mapping
US10553026B2 (en) Dense visual SLAM with probabilistic surfel map
Strasdat et al. Double window optimisation for constant time visual SLAM
Davison Real-time simultaneous localisation and mapping with a single camera
Newman et al. Outdoor SLAM using visual appearance and laser ranging
WO2019057179A1 (fr) Procédé et appareil de localisation et de cartographie simultanées par slam visuel basés sur une caractéristique de points et de lignes
US9299161B2 (en) Method and device for head tracking and computer-readable recording medium
Qiu et al. AirDOS: Dynamic SLAM benefits from articulated objects
CN103646391A (zh) 一种针对动态变化场景的实时摄像机跟踪方法
WO2014114923A1 (fr) Procédé de détection de parties structurelles d'une scène
Knorr et al. Online extrinsic multi-camera calibration using ground plane induced homographies
Li et al. Review of vision-based Simultaneous Localization and Mapping
Schleicher et al. Real-time hierarchical stereo Visual SLAM in large-scale environments
Zhang et al. Hand-held monocular SLAM based on line segments
CN112802096A (zh) 实时定位和建图的实现装置和方法
Tungadi et al. Robust online map merging system using laser scan matching and omnidirectional vision
Ceriani et al. Pose interpolation slam for large maps using moving 3d sensors
Möller et al. Cleaning robot navigation using panoramic views and particle clouds as landmarks
Lee et al. 2D image feature-based real-time RGB-D 3D SLAM
Fan et al. A nonlinear optimization-based monocular dense mapping system of visual-inertial odometry
Liu et al. Hybrid metric-feature mapping based on camera and Lidar sensor fusion
Ou et al. Targetless Lidar-camera Calibration via Cross-modality Structure Consistency
Dong et al. R-LIOM: reflectivity-aware LiDAR-inertial odometry and mapping
Sibley Relative bundle adjustment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12793151

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12793151

Country of ref document: EP

Kind code of ref document: A1