WO2022171278A1 - Map processing device and method thereof - Google Patents

Map processing device and method thereof Download PDF

Info

Publication number
WO2022171278A1
WO2022171278A1 PCT/EP2021/053243 EP2021053243W WO2022171278A1 WO 2022171278 A1 WO2022171278 A1 WO 2022171278A1 EP 2021053243 W EP2021053243 W EP 2021053243W WO 2022171278 A1 WO2022171278 A1 WO 2022171278A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
based map
features
image
images
Prior art date
Application number
PCT/EP2021/053243
Other languages
French (fr)
Inventor
José ARAÚJO
Ioannis KARAGIANNIS
Paula CARBÓ CUBERO
Sebastian BARBAS LAINA
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2021/053243 priority Critical patent/WO2022171278A1/en
Publication of WO2022171278A1 publication Critical patent/WO2022171278A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the present disclosure relates to a map processing device, a method performed by a map processing device, and a corresponding computer program product.
  • SLAM Simultaneous Localization and Mapping
  • electronic devices will increasingly use heterogeneous sets of sensors to localize device locations relative to maps of the real-world.
  • a basic smartphone performing an augmented reality application may use only a monocular camera for localization.
  • a more advanced smartphone may use a combination of a monocular camera and a Lidar (e.g., Apple iPhone 12 and Ipad Pro 12) for localization.
  • Still other advanced devices, such as factory robots or mixed reality headsets may use multiple monocular cameras, stereo cameras, or a Lidar and camera for localization.
  • SLAM algorithms can typically be split into image-based algorithms and structure- based algorithms.
  • the image-based algorithms are configured for localization of devices which contain image sensors such as monocular or stereo cameras
  • structure-based algorithms are configured for localization of devices which contain a depth sensor which actively senses distance to a real-world feature, such as by bouncing a laser (e.g., Lidar), RF signal (e.g., radar), sound (e.g., ultrasonic sensor), etc. off the feature.
  • a laser e.g., Lidar
  • RF signal e.g., radar
  • sound e.g., ultrasonic sensor
  • Some embodiments disclosed herein are directed to a map processing device that includes at least one processor.
  • the at least one processor is configured to perform operations that include accessing a structure-based map comprising depth information from a depth sensor and images from a first camera.
  • the depth information comprises a set of data points indicating locations in three-dimensional (3D) space corresponding to features in the real- world which are sensed by the depth sensor.
  • At least some of the features sensed in the real- world correspond to features captured in the images of the structure-based map.
  • the operations access an image-based map comprising features extracted from images from a second camera using a localization algorithm, and extract the features from the images of the structure-based map using the localization algorithm.
  • the operations identify which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map.
  • the operations generate map elements for the image- based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.
  • the operations further include determining a section of the structure-based map comprising depth information corresponding to features in the real- world that correspond to features in the image-based map that do not have assigned locations in the 3D space. The operations then perform the generation of the map elements for the image-based map based on the depth information of the section of the structure-based map, and combine the generated map elements with the image-based map.
  • Some other related embodiments are directed to a method performed by a map processing device.
  • the method includes accessing a structure-based map comprising depth information from a depth sensor and images from a first camera.
  • the depth information comprises a set of data points indicating locations in 3D space corresponding to features in the real-world sensed by the depth sensor. At least some of the features sensed in the real- world correspond to features captured in the images of the structure-based map.
  • the method accesses an image-based map comprising features extracted from images from a second camera using a localization algorithm, and extracts the features from the images of the structure-based map using the localization algorithm.
  • the method identifies which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map.
  • the method generates map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.
  • Some other related embodiments are directed to a computer program product including a non-transitory computer readable medium storing program code executable by at least one processor of a map processing device to perform operations.
  • the operations include accessing a structure-based map comprising depth information from a depth sensor and images from a first camera.
  • the depth information comprises a set of data points indicating locations in 3D space corresponding to features in the real-world sensed by the depth sensor.
  • At least some of the features sensed in the real-world correspond to features captured in the images of the structure-based map.
  • the operations access an image-based map comprising features extracted from images from a second camera using a localization algorithm, and extract the features from the images of the structure-based map using the localization algorithm.
  • the operations identify which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map.
  • the operations generate map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.
  • a potential advantage which may be provided by these and other embodiments is that content of a structure-based map can be processed to generate map elements which are used to augment an image-based map.
  • a device e.g., a server, running a conventional image-based SLAM algorithm can then directly process the generated map elements to perform localization operations for an image-based device that senses the real-world through a camera.
  • the generated map elements may be processed by the image-based SLAM algorithm without changes, the processing can be performed in a computationally efficient manner.
  • Figure 1 illustrates a simultaneous localization and mapping (SLAM) network computing server or other map processing device that localizes a first device using a structure-based SLAM algorithm and localizes a second device using an image-based SLAM algorithm in accordance with some embodiments of the present disclosure
  • Figures 2 through 4 illustrate flowcharts of operations performed by a map processing device in accordance with some embodiments of the present disclosure.
  • SLAM simultaneous localization and mapping
  • Embodiments of the present disclosure are directed to augmenting an image-based map with map elements that are generated based on content of a structure-based map.
  • a mapping device e.g., a server, running a conventional image-based SLAM algorithm can then directly process the generated map elements to perform localization operations for a device that senses features of the real-world through a camera.
  • Figure 1 illustrates a map processing device 100 that performs mapping (i.e., map creation) and may further perform localization for a first device 110 using, e.g., a structure-based SLAM algorithm, and which may further perform localization for a second device 120 using, e.g., an image-based SLAM algorithm in accordance with some embodiments of the present disclosure.
  • the first device 110 and the second device 120 can communicate with a map processing device 100 that, in at least some embodiments, performs localization and mapping operations for the first device 110 and the second device 120. Communications between the devices 110 and 120 and the map processing device 100 can be performed through one or more radio access network(s) 130 and one or more public, e.g., Internet, and/or private networks 132.
  • Figure 1 illustrates a single map processing device 100, which maybe embodied in a computing server, that performs mapping operations and may further perform localization operations for both the first and second devices 110 and 120 using a mapping algorithm which may be part of a SLAM algorithm
  • the mapping algorithm performs map creation, such as a mapping algorithm that may be part of a conventional SLAM algorithm.
  • a system may include separate map processing devices which each perform localization operations and may further perform localization operations for different ones of the first and second devices 110 and 120.
  • the map processing device 100 may be part of a network computing server, such as SLAM network computing server.
  • the first device 110 includes a depth sensor 112, a first camera 113, a processor 114, a memory 116 storing program code executable by the processor 114, and a wireless transceiver 118 to communicate with a radio access network 130.
  • the depth sensor 112 may be a Lidar sensor, radar, or other sensor that actively senses distance to real-world features.
  • the first camera 113 may be monocular camera, stereo cameras, etc. The depth sensor 112 generates depth information and the first camera 113 captures images as the first device 110 moves.
  • the depth information includes a set of data points indicating locations in three- dimensional (3D) space corresponding to features in the real-world which are sensed by the depth sensor 112.
  • the first camera 113 is arranged and operated so that at least some of the real-world features which are sensed by the depth sensor 112 correspond to features which are captured in the images by the first camera 113.
  • the Intel Realsense Lidar L515, iPhone 12 Pro, and Ipad Pro 12 are example devices that include a camera which captures images of the real-world features which are also sensed by a Lidar sensor.
  • the depth information and images are communicated by the wireless transceiver 118 to the map processing device 100.
  • the map processing device 100 includes a processor 109 and a memory 108 storing program code executable by the processor 109.
  • the program code can include a localization and mapping module 101 which updates depth information 104 and may optionally further include and update images 103 in a structure-based map 102 based on the depth information and images received from the first device 110.
  • operations of the localization and mapping module 101 can be performed by the first device 110 and the second device 120, with the map processing device sending a partial map of the environment to the first device 110 or the second device 120 for processing to perform localization and mapping operations.
  • the first device 110 may locally store at least a part of a structure-based map and the second device 120 may locally store at least a part of an image-based map for locally performing localization and mapping operations.
  • the localization and mapping operations can be centrally performed by, e.g., a computing server, or by multiple devices in a distributed manner.
  • the structure-based map 102 illustrated in Figure 1 includes the depth information 104 and the camera images 103, according to some other embodiments the structure-based map 102 includes the depth information 104 but not camera images.
  • the localization and mapping module 101 can be configured to process camera images and depth information to generate image-based map information without keeping (maintaining a programmed association to) camera information in the structure-based map 102.
  • the first device 110 may be, for example, a robotic vacuum cleaner or a factory automated guided vehicle (AGV) that travels through a building while providing depth information and images to the map processing device 100.
  • AGV factory automated guided vehicle
  • the map processing device 100 can process the received depth information through a structure-based SLAM algorithm, e.g., in a structure-based localization and mapping module 101 in memory 108, to localize the first device 110 relative to the depth information 104 in the structure-based map 102.
  • the localization operations may enable the first device 110 to autonomously navigate through the building.
  • the map processing device 100 can use the received depth information and images to augment the structure-based map 102 for later use to localize the first device 110 while navigating through that portion of the building.
  • the second device 120 does not have a depth sensor. Instead, the second device 120 senses features of the real-world using a second camera 122 that captures images as the second device 120 moves.
  • the second device 120 includes the camera 122, a processor 124, a memory 126 storing program code executable by the processor 124, and a wireless transceiver 128 to communicate with the radio access network 130.
  • the second device 120 provides the images to the map processing device 100.
  • the map processing device 100 uses an image-based mapping algorithm, e.g., image-based SLAM algorithm, to generate an image-based map 105 storing features extracted from the images received from the second device 120.
  • the images may correspond to what are called keyframes in SLAM literature.
  • the second device 120 may optionally include an inertial measurement unit (EMU) 123 or other sensor that generates pose data indicating poses of the camera 122 when the images were captured and/or indicating transformations between the poses of the camera 122 when the images were captured.
  • the pose data may be: 1) absolute pose data from which pose transformation data can be derived; and/or 2) relative pose data.
  • the pose of an image may be defined by the six degree of freedom orientation (position and angle) of the camera 122 relative to a defined reference frame when the image was captured.
  • the pose data can be provided to the map processing device 100 for storage as pose data 107 associated with the features 106 in the image-based map 105.
  • the structure-based map 102 and/or the image-based map 105 may be organized as data structures that can be represented as graph structures as is commonly performed in structure-based and image-based SLAM approaches.
  • a graph structure is defined by vertices and edges, where vertices contain the main map information, example of which may include, without limitation: for structure-based SLAM - depth information, pointcloud or segment descriptors and their respective poses; or for image-based SLAM - keyframes, 2D features and their respective poses.
  • Edges contain the geometric transformation required to be applied to go from edges connecting vertices which contain overlapping information relating the same location in an environment. For example, a set of features in vertex 1 are the same features observed in vertex 100, so there should be an edge between them.
  • Edges contain the geometric transformation required to be applied to traverse from one vertex to adjacent vertices.
  • the second device 120 may be a mixed reality headset worn by a person who is traveling through the same building as the first device 110.
  • the localization and mapping module 101 can augment the features 106 and possibly also the pose data 107 in the image-based map 105.
  • the image-based map 105 can then be processed through an image- based localization algorithm of the localization and mapping module 101 to localize the second device 120 within the building.
  • Device localization can typically be performed more accurately using depth sensors and associated structure-based SLAM algorithms than using cameras and associated image- based SLAM algorithms.
  • depth sensors can have a much higher cost and operationally consume more power than cameras.
  • the second device 120 may have a lower cost and lower power consumption than the first device 110, the second device 120 may be capable of less accurate localization using an image-based SLAM algorithm processing features 106 of the image-based map 105. It would be advantageous for the second device 120 to be localized using the depth information 104 of the structure-based map 102.
  • the image-based localization algorithm is not configured to be able to directly process the structure-based map 102 to localize the second device 120.
  • the image-based localization algorithm would not be able to directly use the structure-based map 102 to localize the second device 120 within that portion of the building.
  • Various embodiments of the present disclosure are directed to enabling an image- based localization algorithm to localize the second device 120 based on content of the structure-based map 102 and without necessitating modification of the image-based localization algorithm.
  • FIG. 2 illustrates a flowchart of operations by a map processing device in accordance with some embodiments of the present disclosure.
  • the map processing device may include a mobile device and/or may include a computing server, and may be part of the map processing device 100 and/or part of one or more other devices, such as the first device 110, the second device 120, and/or another network computing server.
  • the operations access 200 the structure-based map 102 including the depth information 104 and images from the first camera 113.
  • the depth information 104 includes a set of data points indicating locations in 3D space corresponding to features in the real-world sensed by the depth sensor 112, and at least some of the features sensed in the real-world correspond to features captured in the images of the structure-based map 102.
  • the operations also access 202 the image-based map 105 including features 106 extracted from images from the camera 122 using a localization algorithm, i.e., image-based localization algorithm.
  • the operations extract 204 the features from the images 103 of the structure-based map 102 using the localization algorithm.
  • the operations identify 206 which of the features of the image-based map 105 correspond to which of the features extracted from the images 103 of the structure-based map 102.
  • the pose data 107 may be used by the localization and mapping module 101 to assist with identifying 206 which of the features 106 of the image-based map 105 correspond to which of the features extracted from the images 103 of the structure-based map 102.
  • the operations generate 208 map elements for the image-based map 105 based on the depth information 104 of the structure-based map 102 that corresponds to the features of the image-based map 105 which are identified as corresponding to the features extracted from the images 103 of the structure-based map 102.
  • the map processing device provides the map elements from the image-based map 105 to the localization algorithm for processing to determine a pose of the second device 120.
  • the features of the image- based map 105 being extracted from images obtained from a monocular camera, e.g. first camera 113 of the first device 110, such features may be extracted from images captured by any type of camera, e.g., a stereo camera, and may be captured in visible and/or non-visible wavelengths of light.
  • a monocular camera e.g. first camera 113 of the first device 110
  • such features may be extracted from images captured by any type of camera, e.g., a stereo camera, and may be captured in visible and/or non-visible wavelengths of light.
  • the operation to generate 208 includes for one of the features 106 of the image-based map 105 which is identified as corresponding to one of the features extracted from one of the images 103 of the structure-based map 102, assigning a location in the 3D space to the feature 106 of the image-based map 105 based on which of the data points among the set of the structure-based map 102 is nearest to the one of the features extracted from one of the images 103 of the structure-based map 102 that is identified as corresponding to the one of the features of the image-based map 105.
  • Some further embodiments are directed to operations that can densify data from the structure-based map 102 to improve useability and accuracy of the data for updating of the image-based map 105.
  • Figure 3 illustrates a flowchart of corresponding operations by the map processing device for densification of data from the structure-based map 102 in accordance with some embodiments of the present disclosure.
  • the operation to generate 208 (Fig. 2) the map elements for the image-based map 105 based on the depth information 104 of the structure-based map 102 that corresponds to the features 106 of the image-based map 105 which are identified as corresponding to the features extracted 204 (Fig. 2) from the images 103 of the structure- based map 102 includes to compute 300 a distance from the nearest one of the data points among the set of the structure-based map 102 to the one of the features extracted 204 (Fig. 2) from one of the images 103 of the structure-based map 102 that is identified as corresponding to the one of the features 106 of the image-based map 105.
  • the operations perform densification 304 of the set of data points of the structure-based map 102 to generate a densified set of data points.
  • the operations then assign 306 a location in the 3D space to the feature of the image-based map 105 based on which of the data points among the densified set of data points is nearest to the one of the features extracted from the one of the images of the structure-based map 102 that is identified as corresponding to the one of the features of the image-based map.
  • the densification 304 operation can be terminated when densification of the set of data points provides a depth value of a 2D feature (using an image-based feature detector) in the image that we want to add to the image-based map 105.
  • the threshold distance that is used for the determination 302 may ideally be 0 (zero) to maximize accuracy.
  • the value of the threshold distance can be determined based on a trade-off between reducing the amount of processing resources which are utilized for the densification operation 304 and the assigning operation 306 while maintaining an acceptable level of accuracy.
  • the densification of the set of data points of the structure-based map 102 is performed using a depth completion algorithm such as a non-local spatial propagation network (NLSPN) algorithm.
  • NLSPN non-local spatial propagation network
  • An example NLSPN that can be used is described in “Non-local Spatial Propagation Network for Depth Completion”, by J. Park, K. Joo, Z. Hu, C.-K. Liu, I. So Kweon, in “Computer Vision - ECCV 2020”, Lecture Notes in Computer Science, pages 120-136, vol. 12358, Springer, 2020.
  • the densification of the set of data points of the structure-based map 102 may be performed using interpolation or extrapolation among a plurality of the data points in the set, where the plurality includes the nearest one of the data points among the set.
  • Some further embodiments are directed to operations that determine a section of the structure-based map 102 that should be used to update the image-based map 105.
  • Figure 4 illustrates a flowchart of related operations by the map processing device in accordance with some embodiments of the present disclosure.
  • the operations determine 400 a section of the structure-based map 102 comprising the depth information 104 corresponding to features in the real-world that correspond to features in the image-based map 105 that do not have assigned locations in the 3D space.
  • the operations perform 402 the generation 208 (Fig. 2) of the map elements for the image-based map based on the depth information 104 of the section of the structure- based map 102.
  • the operations then combine 404 the map elements with the image-based map 105.
  • the operations to determine 400 the section of the structure-based map 102 may be performed using 3D matching operations that identify correspondences between a structure- based map (dense) and an image-based map (sparse), for example using the Iterative Closest Point (ICP) method, as proposed in “Monocular camera localization in 3D LiDAR maps”, by T. Caselitz, B. Steder, M. Ruhnke, & W. Burgard, in “2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)”, IEEE, 2016.
  • ICP Iterative Closest Point
  • a network computing server e.g., a cloud computer, which stores both the structure-based map 102 and the image-based map 105 because of the amount of data from both maps 102 and 105 that is processed when performing the matching operation.
  • the terms “comprise”, “comprising”, “comprises”, “include”, “including”, “includes”, “have”, “has”, “having”, or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof.
  • the common abbreviation “e.g.”, which derives from the Latin phrase “exempli gratia” may be used to introduce or specify a general example or examples of a previously mentioned item and is not intended to be limiting of such item.
  • the common abbreviation “i.e.”, which derives from the Latin phrase “id Est” may be used to specify a particular item from a more general recitation.
  • Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits.
  • These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

A map processing device (100) performs operations that access a structure-based map (102) containing depth information from a depth sensor (112) and images from a first camera (113). The depth information has a set of data points indicating locations in 3D space of features in the real-world sensed by the depth sensor and corresponding to features captured in the images of the structure-based map. An image-based map (105) is accessed having features extracted from images from a second camera (122) using a localization algorithm. Features are extracted from the images of the structure-based map using the localization algorithm. Features of the image-based map are identified which correspond the features extracted from the images of the structure-based map. Map elements for the image- based map are generated based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.

Description

MAP PROCESSING DEVICE AND METHOD THEREOF
TECHNICAL FIELD
[001] The present disclosure relates to a map processing device, a method performed by a map processing device, and a corresponding computer program product.
BACKGROUND
[002] An important research topic in localization and mapping, also known as Simultaneous Localization and Mapping (SLAM), is performing SLAM with heterogeneous sensor information. It is anticipated that electronic devices will increasingly use heterogeneous sets of sensors to localize device locations relative to maps of the real-world. For example, a basic smartphone performing an augmented reality application may use only a monocular camera for localization. A more advanced smartphone may use a combination of a monocular camera and a Lidar (e.g., Apple iPhone 12 and Ipad Pro 12) for localization. Still other advanced devices, such as factory robots or mixed reality headsets, may use multiple monocular cameras, stereo cameras, or a Lidar and camera for localization.
[003] SLAM algorithms can typically be split into image-based algorithms and structure- based algorithms. The image-based algorithms are configured for localization of devices which contain image sensors such as monocular or stereo cameras, while structure-based algorithms are configured for localization of devices which contain a depth sensor which actively senses distance to a real-world feature, such as by bouncing a laser (e.g., Lidar), RF signal (e.g., radar), sound (e.g., ultrasonic sensor), etc. off the feature.
[004] Because of the diverse types of sensors which are being used by devices for localization, situations arise where a device that needs to be localized using an image-based SLAM algorithm does not have access to an existing map for the present device location, although such a map does exist for use by structure-based SLAM algorithms. The device therefore may not be localized relative to the location or at least not localized as accurately as otherwise possible if an existing compatible map were available. Hence, there is a need to develop more adaptable SLAM algorithms and systems.
SUMMARY [005] Some embodiments disclosed herein are directed to a map processing device that includes at least one processor. The at least one processor is configured to perform operations that include accessing a structure-based map comprising depth information from a depth sensor and images from a first camera. The depth information comprises a set of data points indicating locations in three-dimensional (3D) space corresponding to features in the real- world which are sensed by the depth sensor. At least some of the features sensed in the real- world correspond to features captured in the images of the structure-based map. The operations access an image-based map comprising features extracted from images from a second camera using a localization algorithm, and extract the features from the images of the structure-based map using the localization algorithm. The operations identify which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map. The operations generate map elements for the image- based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.
[006] In some further embodiments, the operations further include determining a section of the structure-based map comprising depth information corresponding to features in the real- world that correspond to features in the image-based map that do not have assigned locations in the 3D space. The operations then perform the generation of the map elements for the image-based map based on the depth information of the section of the structure-based map, and combine the generated map elements with the image-based map.
[007] Some other related embodiments are directed to a method performed by a map processing device. The method includes accessing a structure-based map comprising depth information from a depth sensor and images from a first camera. The depth information comprises a set of data points indicating locations in 3D space corresponding to features in the real-world sensed by the depth sensor. At least some of the features sensed in the real- world correspond to features captured in the images of the structure-based map. The method accesses an image-based map comprising features extracted from images from a second camera using a localization algorithm, and extracts the features from the images of the structure-based map using the localization algorithm. The method identifies which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map. The method generates map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.
[008] Some other related embodiments are directed to a computer program product including a non-transitory computer readable medium storing program code executable by at least one processor of a map processing device to perform operations. The operations include accessing a structure-based map comprising depth information from a depth sensor and images from a first camera. The depth information comprises a set of data points indicating locations in 3D space corresponding to features in the real-world sensed by the depth sensor. At least some of the features sensed in the real-world correspond to features captured in the images of the structure-based map. The operations access an image-based map comprising features extracted from images from a second camera using a localization algorithm, and extract the features from the images of the structure-based map using the localization algorithm. The operations identify which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map. The operations generate map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.
[009] As will be explained in further detail below, a potential advantage which may be provided by these and other embodiments is that content of a structure-based map can be processed to generate map elements which are used to augment an image-based map. A device, e.g., a server, running a conventional image-based SLAM algorithm can then directly process the generated map elements to perform localization operations for an image-based device that senses the real-world through a camera. Moreover, because the generated map elements may be processed by the image-based SLAM algorithm without changes, the processing can be performed in a computationally efficient manner.
[0010] Other devices, methods, and computer program products according to embodiments will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such devices, methods, and computer program products be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims. Moreover, it is intended that all embodiments disclosed herein can be implemented separately or combined in any way and/or combination. BRIEF DESCRIPTION OF THE DRAWINGS [0011] Aspects of the present disclosure are illustrated by way of example and are not limited by the accompanying drawings. In the drawings:
[0012] Figure 1 illustrates a simultaneous localization and mapping (SLAM) network computing server or other map processing device that localizes a first device using a structure-based SLAM algorithm and localizes a second device using an image-based SLAM algorithm in accordance with some embodiments of the present disclosure; and [0013] Figures 2 through 4 illustrate flowcharts of operations performed by a map processing device in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION
[0014] Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of various present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.
[0015] Embodiments of the present disclosure are directed to augmenting an image-based map with map elements that are generated based on content of a structure-based map. A mapping device, e.g., a server, running a conventional image-based SLAM algorithm can then directly process the generated map elements to perform localization operations for a device that senses features of the real-world through a camera.
[0016] Some embodiments are now explained in the context of Figure 1 and an example operational scenario. Figure 1 illustrates a map processing device 100 that performs mapping (i.e., map creation) and may further perform localization for a first device 110 using, e.g., a structure-based SLAM algorithm, and which may further perform localization for a second device 120 using, e.g., an image-based SLAM algorithm in accordance with some embodiments of the present disclosure. The first device 110 and the second device 120 can communicate with a map processing device 100 that, in at least some embodiments, performs localization and mapping operations for the first device 110 and the second device 120. Communications between the devices 110 and 120 and the map processing device 100 can be performed through one or more radio access network(s) 130 and one or more public, e.g., Internet, and/or private networks 132.
[0017] Although Figure 1 illustrates a single map processing device 100, which maybe embodied in a computing server, that performs mapping operations and may further perform localization operations for both the first and second devices 110 and 120 using a mapping algorithm which may be part of a SLAM algorithm, solutions with more than one map processing device may be envisaged. The mapping algorithm performs map creation, such as a mapping algorithm that may be part of a conventional SLAM algorithm. For instance, a system may include separate map processing devices which each perform localization operations and may further perform localization operations for different ones of the first and second devices 110 and 120. as an example, the map processing device 100 may be part of a network computing server, such as SLAM network computing server. Furthermore, at least a portion of the localization mapping operations may be performed within a mobile device, such as the first device 110 and/or the second device 120, and/or any other computing device. [0018] The first device 110 includes a depth sensor 112, a first camera 113, a processor 114, a memory 116 storing program code executable by the processor 114, and a wireless transceiver 118 to communicate with a radio access network 130. The depth sensor 112 may be a Lidar sensor, radar, or other sensor that actively senses distance to real-world features. The first camera 113 may be monocular camera, stereo cameras, etc. The depth sensor 112 generates depth information and the first camera 113 captures images as the first device 110 moves. The depth information includes a set of data points indicating locations in three- dimensional (3D) space corresponding to features in the real-world which are sensed by the depth sensor 112. The first camera 113 is arranged and operated so that at least some of the real-world features which are sensed by the depth sensor 112 correspond to features which are captured in the images by the first camera 113. The Intel Realsense Lidar L515, iPhone 12 Pro, and Ipad Pro 12 are example devices that include a camera which captures images of the real-world features which are also sensed by a Lidar sensor. The depth information and images are communicated by the wireless transceiver 118 to the map processing device 100.
[0019] In the example of Figure 1, the map processing device 100 includes a processor 109 and a memory 108 storing program code executable by the processor 109. The program code can include a localization and mapping module 101 which updates depth information 104 and may optionally further include and update images 103 in a structure-based map 102 based on the depth information and images received from the first device 110. Alternatively or additionally, operations of the localization and mapping module 101 can be performed by the first device 110 and the second device 120, with the map processing device sending a partial map of the environment to the first device 110 or the second device 120 for processing to perform localization and mapping operations. Still alternatively or additionally, the first device 110 may locally store at least a part of a structure-based map and the second device 120 may locally store at least a part of an image-based map for locally performing localization and mapping operations. Accordingly, the localization and mapping operations can be centrally performed by, e.g., a computing server, or by multiple devices in a distributed manner.
[0020] Although the structure-based map 102 illustrated in Figure 1 includes the depth information 104 and the camera images 103, according to some other embodiments the structure-based map 102 includes the depth information 104 but not camera images. The localization and mapping module 101 can be configured to process camera images and depth information to generate image-based map information without keeping (maintaining a programmed association to) camera information in the structure-based map 102.
[0021] The first device 110 may be, for example, a robotic vacuum cleaner or a factory automated guided vehicle (AGV) that travels through a building while providing depth information and images to the map processing device 100. When the first device 110 is navigating through previously traversed portions of a building the map processing device 100 can process the received depth information through a structure-based SLAM algorithm, e.g., in a structure-based localization and mapping module 101 in memory 108, to localize the first device 110 relative to the depth information 104 in the structure-based map 102. The localization operations may enable the first device 110 to autonomously navigate through the building. Moreover, when the first device 110 is navigating through a newly traversed portion of the building, the map processing device 100 can use the received depth information and images to augment the structure-based map 102 for later use to localize the first device 110 while navigating through that portion of the building.
[0022] In contrast to the first device 110, the second device 120 does not have a depth sensor. Instead, the second device 120 senses features of the real-world using a second camera 122 that captures images as the second device 120 moves. The second device 120 includes the camera 122, a processor 124, a memory 126 storing program code executable by the processor 124, and a wireless transceiver 128 to communicate with the radio access network 130. The second device 120 provides the images to the map processing device 100. In the example of Figure 1, the map processing device 100 uses an image-based mapping algorithm, e.g., image-based SLAM algorithm, to generate an image-based map 105 storing features extracted from the images received from the second device 120. The images may correspond to what are called keyframes in SLAM literature.
[0023] The second device 120 may optionally include an inertial measurement unit (EMU) 123 or other sensor that generates pose data indicating poses of the camera 122 when the images were captured and/or indicating transformations between the poses of the camera 122 when the images were captured. The pose data may be: 1) absolute pose data from which pose transformation data can be derived; and/or 2) relative pose data. The pose of an image may be defined by the six degree of freedom orientation (position and angle) of the camera 122 relative to a defined reference frame when the image was captured. The pose data can be provided to the map processing device 100 for storage as pose data 107 associated with the features 106 in the image-based map 105.
[0024] The structure-based map 102 and/or the image-based map 105 may be organized as data structures that can be represented as graph structures as is commonly performed in structure-based and image-based SLAM approaches. A graph structure is defined by vertices and edges, where vertices contain the main map information, example of which may include, without limitation: for structure-based SLAM - depth information, pointcloud or segment descriptors and their respective poses; or for image-based SLAM - keyframes, 2D features and their respective poses. Edges contain the geometric transformation required to be applied to go from edges connecting vertices which contain overlapping information relating the same location in an environment. For example, a set of features in vertex 1 are the same features observed in vertex 100, so there should be an edge between them. Edges contain the geometric transformation required to be applied to traverse from one vertex to adjacent vertices.
[0025] Continuing the above example operational scenario, the second device 120 may be a mixed reality headset worn by a person who is traveling through the same building as the first device 110. As the second device 120 travels through the building, the localization and mapping module 101 can augment the features 106 and possibly also the pose data 107 in the image-based map 105. The image-based map 105 can then be processed through an image- based localization algorithm of the localization and mapping module 101 to localize the second device 120 within the building.
[0026] Device localization can typically be performed more accurately using depth sensors and associated structure-based SLAM algorithms than using cameras and associated image- based SLAM algorithms. However, depth sensors can have a much higher cost and operationally consume more power than cameras. Thus, although the second device 120 may have a lower cost and lower power consumption than the first device 110, the second device 120 may be capable of less accurate localization using an image-based SLAM algorithm processing features 106 of the image-based map 105. It would be advantageous for the second device 120 to be localized using the depth information 104 of the structure-based map 102.
[0027] However, the image-based localization algorithm is not configured to be able to directly process the structure-based map 102 to localize the second device 120. Thus, when the first device 110 has assisted the map processing device 100 with mapping a portion of the building to the structure-based map 102, the image-based localization algorithm would not be able to directly use the structure-based map 102 to localize the second device 120 within that portion of the building.
[0028] Various embodiments of the present disclosure are directed to enabling an image- based localization algorithm to localize the second device 120 based on content of the structure-based map 102 and without necessitating modification of the image-based localization algorithm.
[0029] Figure 2 illustrates a flowchart of operations by a map processing device in accordance with some embodiments of the present disclosure. The map processing device may include a mobile device and/or may include a computing server, and may be part of the map processing device 100 and/or part of one or more other devices, such as the first device 110, the second device 120, and/or another network computing server.
[0030] Referring to Figure 2, the operations access 200 the structure-based map 102 including the depth information 104 and images from the first camera 113. As explained above, the depth information 104 includes a set of data points indicating locations in 3D space corresponding to features in the real-world sensed by the depth sensor 112, and at least some of the features sensed in the real-world correspond to features captured in the images of the structure-based map 102. The operations also access 202 the image-based map 105 including features 106 extracted from images from the camera 122 using a localization algorithm, i.e., image-based localization algorithm. The operations extract 204 the features from the images 103 of the structure-based map 102 using the localization algorithm. The operations identify 206 which of the features of the image-based map 105 correspond to which of the features extracted from the images 103 of the structure-based map 102. The pose data 107 may be used by the localization and mapping module 101 to assist with identifying 206 which of the features 106 of the image-based map 105 correspond to which of the features extracted from the images 103 of the structure-based map 102. The operations generate 208 map elements for the image-based map 105 based on the depth information 104 of the structure-based map 102 that corresponds to the features of the image-based map 105 which are identified as corresponding to the features extracted from the images 103 of the structure-based map 102.
[0031] In some further embodiments, the map processing device provides the map elements from the image-based map 105 to the localization algorithm for processing to determine a pose of the second device 120.
[0032] Although some embodiments are described in the context of the features of the image- based map 105 being extracted from images obtained from a monocular camera, e.g. first camera 113 of the first device 110, such features may be extracted from images captured by any type of camera, e.g., a stereo camera, and may be captured in visible and/or non-visible wavelengths of light.
[0033] In some embodiments, the operation to generate 208 (Fig. 2) the map elements for the image-based map 105, includes for one of the features 106 of the image-based map 105 which is identified as corresponding to one of the features extracted from one of the images 103 of the structure-based map 102, assigning a location in the 3D space to the feature 106 of the image-based map 105 based on which of the data points among the set of the structure-based map 102 is nearest to the one of the features extracted from one of the images 103 of the structure-based map 102 that is identified as corresponding to the one of the features of the image-based map 105.
[0034] Some further embodiments are directed to operations that can densify data from the structure-based map 102 to improve useability and accuracy of the data for updating of the image-based map 105. Figure 3 illustrates a flowchart of corresponding operations by the map processing device for densification of data from the structure-based map 102 in accordance with some embodiments of the present disclosure.
[0035] Referring to Figure 3, the operation to generate 208 (Fig. 2) the map elements for the image-based map 105 based on the depth information 104 of the structure-based map 102 that corresponds to the features 106 of the image-based map 105 which are identified as corresponding to the features extracted 204 (Fig. 2) from the images 103 of the structure- based map 102, includes to compute 300 a distance from the nearest one of the data points among the set of the structure-based map 102 to the one of the features extracted 204 (Fig. 2) from one of the images 103 of the structure-based map 102 that is identified as corresponding to the one of the features 106 of the image-based map 105. When the distance that is computed is determined 302 to be greater than a threshold distance, the operations perform densification 304 of the set of data points of the structure-based map 102 to generate a densified set of data points. The operations then assign 306 a location in the 3D space to the feature of the image-based map 105 based on which of the data points among the densified set of data points is nearest to the one of the features extracted from the one of the images of the structure-based map 102 that is identified as corresponding to the one of the features of the image-based map. In some embodiments, the densification 304 operation can be terminated when densification of the set of data points provides a depth value of a 2D feature (using an image-based feature detector) in the image that we want to add to the image-based map 105.
[0036] The threshold distance that is used for the determination 302 may ideally be 0 (zero) to maximize accuracy. However, the value of the threshold distance can be determined based on a trade-off between reducing the amount of processing resources which are utilized for the densification operation 304 and the assigning operation 306 while maintaining an acceptable level of accuracy.
[0037] In some embodiments, the densification of the set of data points of the structure-based map 102 is performed using a depth completion algorithm such as a non-local spatial propagation network (NLSPN) algorithm. An example NLSPN that can be used is described in “Non-local Spatial Propagation Network for Depth Completion”, by J. Park, K. Joo, Z. Hu, C.-K. Liu, I. So Kweon, in “Computer Vision - ECCV 2020”, Lecture Notes in Computer Science, pages 120-136, vol. 12358, Springer, 2020. Additionally, or alternatively, the densification of the set of data points of the structure-based map 102 may be performed using interpolation or extrapolation among a plurality of the data points in the set, where the plurality includes the nearest one of the data points among the set.
[0038] Some further embodiments are directed to operations that determine a section of the structure-based map 102 that should be used to update the image-based map 105. Figure 4 illustrates a flowchart of related operations by the map processing device in accordance with some embodiments of the present disclosure.
[0039] Referring to Figure 4, the operations determine 400 a section of the structure-based map 102 comprising the depth information 104 corresponding to features in the real-world that correspond to features in the image-based map 105 that do not have assigned locations in the 3D space. The operations perform 402 the generation 208 (Fig. 2) of the map elements for the image-based map based on the depth information 104 of the section of the structure- based map 102. The operations then combine 404 the map elements with the image-based map 105.
[0040] The operations to determine 400 the section of the structure-based map 102 may be performed using 3D matching operations that identify correspondences between a structure- based map (dense) and an image-based map (sparse), for example using the Iterative Closest Point (ICP) method, as proposed in “Monocular camera localization in 3D LiDAR maps”, by T. Caselitz, B. Steder, M. Ruhnke, & W. Burgard, in “2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)”, IEEE, 2016.
[0041] It can be computationally efficient to perform the determination 400 in a network computing server, e.g., a cloud computer, which stores both the structure-based map 102 and the image-based map 105 because of the amount of data from both maps 102 and 105 that is processed when performing the matching operation.
[0042] Further definitions and embodiments are now explained below.
[0043] In the above description of various embodiments of present inventive concepts, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of present inventive concepts. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which present inventive concepts belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense expressly so defined herein.
[0044] When an element is referred to as being "connected", "coupled", "responsive", or variants thereof to another element, it can be directly connected, coupled, or responsive to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected", "directly coupled", "directly responsive", or variants thereof to another element, there are no intervening elements present. Like numbers refer to like elements throughout. Furthermore, "coupled", "connected", "responsive", or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term "and/or" includes any and all combinations of one or more of the associated listed items. [0045] It will be understood that although the terms first, second, third, etc. may be used herein to describe various elements/operations, these elements/operations should not be limited by these terms. These terms are only used to distinguish one element/operation from another element/operation. Thus, a first element/operation in some embodiments could be termed a second element/operation in other embodiments without departing from the teachings of present inventive concepts. The same reference numerals or the same reference designators denote the same or similar elements throughout the specification.
[0046] As used herein, the terms "comprise", "comprising", "comprises", "include", "including", "includes", "have", "has", "having", or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation "e.g.", which derives from the Latin phrase "exempli gratia," may be used to introduce or specify a general example or examples of a previously mentioned item and is not intended to be limiting of such item. The common abbreviation "i.e.", which derives from the Latin phrase "id Est," may be used to specify a particular item from a more general recitation.
[0047] Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).
[0048] These computer program instructions may also be stored in a tangible computer- readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of present inventive concepts may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.
[0049] It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
[0050] Many variations and modifications can be made to the embodiments without substantially departing from the principles of the present inventive concepts. All such variations and modifications are intended to be included herein within the scope of present inventive concepts. Accordingly, the above disclosed subject matter is to be considered illustrative, and not restrictive, and the appended examples of embodiments are intended to cover all such modifications, enhancements, and other embodiments, which fall within the spirit and scope of present inventive concepts. Thus, to the maximum extent allowed by law, the scope of present inventive concepts is to be determined by the broadest permissible interpretation of the present disclosure including the following examples of embodiments and their equivalents and shall not be restricted or limited by the foregoing detailed description.

Claims

CLAIMS:
1. A map processing device (100, 110, 120) comprising: at least one processor (109) configured to: access a structure-based map (102) comprising depth information (104) from a depth sensor (112) and images (103) from a first camera (113), wherein the depth information comprises a set of data points indicating locations in three dimensional, 3D, space corresponding to features in the real-world sensed by the depth sensor, and wherein at least some of the features sensed in the real-world correspond to features captured in the images of the structure-based map; access an image-based map (105) comprising features (106) extracted from images from a second camera (122) using a localization algorithm; extracting the features from the images (103) of the structure-based map (102) using the localization algorithm; identifying which of the features (106) of the image-based map (105) correspond to which of the features extracted from the images (103) of the structure-based map (102); and generate map elements for the image-based map (105) based on the depth information (104) of the structure-based map (102) that corresponds to the features (106) of the image-based map (105) which are identified as corresponding to the features extracted from the images (103) of the structure-based map (102).
2. The map processing device (100, 110, 120) of Claim 1, wherein: the image-based map (105) further comprises pose data (107) indicating poses of the second camera (122) when the images were captured and/or transformations between the poses; and the identification of which of the features (106) of the image-based map (105) correspond to which of the features extracted from the images (103) of the structure-based map (102) is performed using the pose data.
3. The map processing device (100, 110, 120) of any of Claims 1 to 2, wherein the at least one processor (109) is further configured to generate the map elements for the image-based map (105) based on the depth information (104) of the structure-based map (102) that corresponds to the features (106) of the image-based map (105) which are identified as corresponding to the features extracted from the images (103) of the structure- based map (102), by operations comprising: for one of the features of the image-based map which is identified as corresponding to one of the features extracted from one of the images of the structure-based map, assigning a location in the 3D space to the feature of the image-based map based on which of the data points among the set of the structure-based map is nearest to the one of the features extracted from one of the images of the structure-based map that is identified as corresponding to the one of the features of the image- based map.
4. The map processing device (100, 110, 120) of Claim 3, wherein the at least one processor (109) is further configured to generate the map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map, by operations comprising: computing a distance from the nearest one of the data points among the set of the structure-based map to the one of the features extracted from one of the images of the structure-based map that is identified as corresponding to the one of the features of the image-based map; and when the distance that is computed is greater than a threshold distance, performing densification of the set of data points of the structure-based map to generate a densified set of data points, and assigning a location in the 3D space to the feature of the image-based map based on which of the data points among the densified set of data points is nearest to the one of the features extracted from the one of the images of the structure- based map that is identified as corresponding to the one of the features of the image-based map.
5. The map processing device (100, 110, 120) of Claim 4, wherein the at least one processor (109) is further configured to perform densification of the set of data points of the structure-based map using a depth completion algorithm.
6 The map processing device (100, 110, 120) of Claim 4, wherein the at least one processor (109) is further configured to perform densification of the set of data points of the structure-based map using interpolation or extrapolation among a plurality of the data points in the set, the plurality including the nearest one of the data points among the set.
7. The map processing device (100, 110, 120) of any of Claims 1 to 6, wherein the at least one processor is further configured to: determine a section of the structure-based map comprising depth information corresponding to features in the real-world that correspond to features in the image-based map that do not have assigned locations in the 3D space; perform the generation of the map elements for the image-based map based on the depth information of the section of the structure-based map; and combine the map elements with the image-based map.
8. The map processing device (100, 110, 120) of any of Claims 1 to 7, wherein the map processing device comprises a mobile device (110, 120).
9. The map processing device (100, 110, 120) of any of Claims 1 to 7, wherein the map processing device comprises a computing server (100).
10. The map processing device (100, 110, 120) of any of Claims 1 to 9, wherein the features of the image-based map are extracted from images obtained from a monocular camera or a stereo camera.
11. The map processing device (100, 110, 120) of any of Claims 1 to 10, wherein the at least one processor is further configured to: provide the map elements from the image-based map to the localization algorithm for processing to determine a pose of a mobile device which includes the second camera.
12. A method performed by a map processing device comprising: accessing (200) a structure-based map comprising depth information from a depth sensor and images from a first camera, wherein the depth information comprises a set of data points indicating locations in three dimensional, 3D, space corresponding to features in the real-world sensed by the depth sensor, and wherein at least some of the features sensed in the real-world correspond to features captured in the images of the structure-based map; accessing (202) an image-based map comprising features extracted from images from a second camera using a localization algorithm; extracting (204) the features from the images of the structure-based map using the localization algorithm; identifying (206) which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map; and generating (208) map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map.
13. The method of Claim 12, wherein: the image-based map further comprises pose data indicating poses of the second camera when the images were captured and/or transformations between the poses; and the identification (206) of which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map is performed using the pose data.
14. The method of any of Claims 12 to 13, wherein the generation (208) of the map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map, comprises: for one of the features of the image-based map which is identified as corresponding to one of the features extracted from one of the images of the structure-based map, assigning a location in the 3D space to the feature of the image-based map based on which of the data points among the set of the structure-based map is nearest to the one of the features extracted from one of the images of the structure-based map that is identified as corresponding to the one of the features of the image- based map.
15. The method of Claim 14, wherein the generation (208) of the map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image-based map which are identified as corresponding to the features extracted from the images of the structure-based map, further comprises: computing (300) a distance from the nearest one of the data points among the set of the structure-based map to the one of the features extracted from one of the images of the structure-based map that is identified as corresponding to the one of the features of the image-based map; and when the distance that is computed is greater (302) than a threshold distance, performing (304) densification of the set of data points of the structure-based map to generate a densified set of data points, and assigning (306) a location in the 3D space to the feature of the image-based map based on which of the data points among the densified set of data points is nearest to the one of the features extracted from the one of the images of the structure-based map that is identified as corresponding to the one of the features of the image-based map.
16. The method of Claim 15, wherein the densification (304) of the set of data points of the structure-based map is performed using a depth completion algorithm.
17. The method of Claim 15, the densification (304) of the set of data points of the structure-based map is performed using interpolation or extrapolation among a plurality of the data points in the set, the plurality including the nearest one of the data points among the set.
18. The method of any of Claims 12 to 17, wherein the method further comprises: determining (400) a section of the structure-based map comprising depth information corresponding to features in the real-world that correspond to features in the image-based map that do not have assigned locations in the 3D space; performing (402) the generation of the map elements for the image-based map based on the depth information of the section of the structure-based map; and combining (404) the map elements with the image-based map.
19. The method of any of Claims 12 to 18, wherein the map processing device comprises a mobile device.
20. The method of any of Claims 12 to 18, wherein the map processing device comprises a computing server.
21. The method of any of Claims 12 to 20, wherein the features of the image- based map are extracted from images obtained from a monocular camera or a stereo camera.
22. The method of any of Claims 12 to 21, wherein the method further comprises: providing the map elements from the image-based map to the localization algorithm for processing to determine a pose of a mobile device which includes the second camera.
23. A computer program product comprising: a non-transitory computer readable medium (108, 116, 126) storing program code executable by at least one processor of a map processing device to perform operations comprising: accessing (200) a structure-based map comprising depth information from a depth sensor and images from a first camera, wherein the depth information comprises a set of data points indicating locations in three dimensional, 3D, space corresponding to features in the real-world sensed by the depth sensor, and wherein at least some of the features sensed in the real-world correspond to features captured in the images of the structure-based map; accessing (202) an image-based map comprising features extracted from images from a second camera using a localization algorithm; extracting (204) the features from the images of the structure-based map using the localization algorithm; identifying (206) which of the features of the image-based map correspond to which of the features extracted from the images of the structure-based map; and generating (208) map elements for the image-based map based on the depth information of the structure-based map that corresponds to the features of the image- based map which are identified as corresponding to the features extracted from the images of the structure-based map.
24. The computer program product of Claim 23, wherein the non-transitory computer readable medium (108, 116, 126) further stores program code executable by the at least one processor to perform the method of any of Claims 12 to 22.
PCT/EP2021/053243 2021-02-10 2021-02-10 Map processing device and method thereof WO2022171278A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/053243 WO2022171278A1 (en) 2021-02-10 2021-02-10 Map processing device and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/053243 WO2022171278A1 (en) 2021-02-10 2021-02-10 Map processing device and method thereof

Publications (1)

Publication Number Publication Date
WO2022171278A1 true WO2022171278A1 (en) 2022-08-18

Family

ID=74591999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/053243 WO2022171278A1 (en) 2021-02-10 2021-02-10 Map processing device and method thereof

Country Status (1)

Country Link
WO (1) WO2022171278A1 (en)

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FENG GUANYUAN ET AL: "Visual Map Construction Using RGB-D Sensors for Image-Based Localization in Indoor Environments", JOURNAL OF SENSORS, vol. 2017, 1 January 2017 (2017-01-01), US, pages 1 - 18, XP055845025, ISSN: 1687-725X, Retrieved from the Internet <URL:https://core.ac.uk/download/pdf/206396949.pdf> DOI: 10.1155/2017/8037607 *
HUAI YU ET AL: "Monocular Camera Localization in Prior LiDAR Maps with 2D-3D Line Correspondences", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 31 July 2020 (2020-07-31), XP081726125 *
J. PARKK. JOOZ. HUC.-K. LIUI. SO KWEON: "Computer Vision - ECCV 2020'', Lecture Notes in Computer Science", vol. 12358, 2020, SPRINGER, article "Non-local Spatial Propagation Network for Depth Completion", pages: 120 - 136
SHING YAN LOO ET AL: "DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative Depth Prediction", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 October 2020 (2020-10-08), XP081781897 *
T. CASELITZB. STEDERM. RUHNKEW. BURGARD: "Monocular camera localization in 3D LiDAR maps", 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS, 2016

Similar Documents

Publication Publication Date Title
US10659768B2 (en) System and method for virtually-augmented visual simultaneous localization and mapping
JP6430064B2 (en) Method and system for aligning data
CN110705574B (en) Positioning method and device, equipment and storage medium
US8199977B2 (en) System and method for extraction of features from a 3-D point cloud
US20140279860A1 (en) Client-Server Based Dynamic Search
KR20160003731A (en) Wide area localization from slam maps
Magnabosco et al. Cross-spectral visual simultaneous localization and mapping (SLAM) with sensor handover
CN103123727A (en) Method and device for simultaneous positioning and map building
Knorr et al. Online extrinsic multi-camera calibration using ground plane induced homographies
WO2020063878A1 (en) Data processing method and apparatus
WO2019089018A1 (en) Mobile robots to generate reference maps for localization
TW202115366A (en) System and method for probabilistic multi-robot slam
WO2021195939A1 (en) Calibrating method for external parameters of binocular photographing device, movable platform and system
US20240087162A1 (en) Map processing device and method thereof
Pi et al. Stereo visual SLAM system in underwater environment
Munguía et al. Monocular SLAM for visual odometry: A full approach to the delayed inverse‐depth feature initialization method
Kim et al. A real-time stereo depth extraction hardware for intelligent home assistant robot
JP7224592B2 (en) Information processing device, information processing method, and program
US11561553B1 (en) System and method of providing a multi-modal localization for an object
Ahmadi et al. HDPV-SLAM: Hybrid depth-augmented panoramic visual SLAM for mobile mapping system with tilted LiDAR and panoramic visual camera
WO2023088127A1 (en) Indoor navigation method, server, apparatus and terminal
WO2017149440A1 (en) Method, apparatus and computer program product for navigation in an indoor space
KR101054520B1 (en) How to recognize the location and direction of the indoor mobile robot
WO2022171278A1 (en) Map processing device and method thereof
Herath et al. Simultaneous localisation and mapping: a stereo vision based approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21704772

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21704772

Country of ref document: EP

Kind code of ref document: A1