WO2019125262A1 - A method and system for simultaneous navigation and mapping of a space - Google Patents

A method and system for simultaneous navigation and mapping of a space Download PDF

Info

Publication number
WO2019125262A1
WO2019125262A1 PCT/SE2017/051341 SE2017051341W WO2019125262A1 WO 2019125262 A1 WO2019125262 A1 WO 2019125262A1 SE 2017051341 W SE2017051341 W SE 2017051341W WO 2019125262 A1 WO2019125262 A1 WO 2019125262A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
dimensional map
map segment
image
parts
Prior art date
Application number
PCT/SE2017/051341
Other languages
French (fr)
Inventor
Jimmy Jonsson
Original Assignee
Saab Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Saab Ab filed Critical Saab Ab
Priority to PCT/SE2017/051341 priority Critical patent/WO2019125262A1/en
Publication of WO2019125262A1 publication Critical patent/WO2019125262A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/254Image signal generators using stereoscopic image cameras in combination with electromagnetic radiation sources for illuminating objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/285Analysis of motion using a sequence of stereo image pairs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/586Depth or shape recovery from multiple images from multiple light sources, e.g. photometric stereo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/207Image signal generators using stereoscopic image cameras using a single 2D image sensor
    • H04N13/221Image signal generators using stereoscopic image cameras using a single 2D image sensor using the relative movement between cameras and objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • the present disclosure relates to a method and system for simultaneous navigation and mapping of a space based on images captured by a moving platform.
  • simultaneous localization and mapping is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of the location the robot within it. While this initially appears to be a chicken- and-egg problem there are several algorithms known for solving it, at least approximately, for certain conditions. Approximate solution methods may include the particle filter and extended Kalman filter.
  • SLAM algorithms are tailored to the available resources, hence not aimed at perfection, but at operational compliance. Published approaches are employed in self-driving cars, unmanned aerial vehicles, autonomous underwater vehicles, planetary rovers, newly emerging domestic robots and even inside the human body.
  • Statistical techniques used to approximate the solutions include Kalman filters, particle filters (Monte Carlo methods) and scan matching of range data. These approximate solutions provide an estimation of the posterior probability function for the pose of the robot and for the parameters of the map.
  • Set-membership techniques are mainly based on interval constraint propagation. The known techniques provide a set which encloses the pose of the robot and a set approximation of the map.
  • Bundle adjustment is another known technique for SLAM using image data. Bundle adjustment jointly estimates poses and landmark positions, increasing map fidelity.
  • SLAM algorithms may be driven by differing requirements and assumptions about the types of maps, sensors and models.
  • An object of the disclosure is to obtain an improved method for simultaneous navigation and mapping of a space.
  • the method is based on use of stereo camera pairs and stereo image processing.
  • the method may for example be implemented at an Unmanned Aerial Vehicle, UAV.
  • UAV refers to an engine driven aerial vehicle without a pilot or remote control.
  • the UAV may be formed in any size.
  • An UAV is however only an example.
  • the method disclosed herein can be mounted at any movable object.
  • the space may be under water.
  • the space may be above ground
  • Different embodiments of the present disclosure relate to a method for simultaneous navigation and mapping of a space based on images captured by a moving platform.
  • the method comprises capturing a set of images comprising at least two two-dimensional images, said set of images being at least partly overlapping, forming a three dimensional map segment based on the overlap part of the set of images, and processing the formed three-dimensional map segment.
  • the processing of the formed three-dimensional map segment comprises detecting points or parts in the three dimensional map segment having characteristic features, and updating a map with the formed map segment.
  • the detected points or parts in the three dimensional map segment having characteristic features are compared to at least one preceding point or part detection, a movement of the platform is estimated based on the comparison, and the position of the platform is estimated based on the estimated movement of the platform.
  • a tree dimensional mapping of the space is built up during use of a camera arrangement arranged to capture overlapping image pairs so as to obtain depth data related to the captured images.
  • the camera arrangement may comprise at least two cameras arranged at a distance from each other.
  • the building of the three dimensional map may be performed in real time.
  • each position is given in relation to a starting or reference position.
  • the reference position may be the position where measurements started.
  • the position may be given in a three dimensional coordinate system.
  • the reference position may be given a geographical coordinate determined when initiating measurements.
  • the geographical coordinate may be given in a global coordinate system.
  • the geographical coordinate may for example be obtained from a GPS receiver.
  • the method is typically very well suited for building three dimensional maps in secluded spaces such as the inside of buildings etc.
  • the method is very well suited for building three dimensional maps where the distance to the mapped objects are tenths of meters or fewer.
  • the method is very well suited form mapping static environments, wherein images captured mainly comprises static objects. If moving objects are in the captured images, these may be filtered out and/or disregarded.
  • built-up three dimensional maps may be exchanged between platforms. Accordingly, each platform can build three dimensional maps based information received also from other platforms.
  • the method further comprises a step of obtaining an improved position, wherein the estimated position is used for selecting a map part whereupon the improved estimated position is determined based on the estimated position and an identified corresponding position in the selected map part. This supports in securing that drifting in the estimation of the position of the platform is avoided.
  • the method further comprises a step of globally updating the map.
  • detected points or parts of the map are summed up or weighted with previously made detections.
  • the step of globally updating the map comprises the steps of accessing the latest detected points or parts in the three dimensional map segment having characteristic features, to search the map to find a match with other previously detected points or parts having said accessed characteristic features and to globally update the map to sum up or weight coordinates of the accessed detected points or parts having the characteristic features with coordinates of corresponding found matching detected points or parts having the characteristic features.
  • the global update of the match can be made using for example an Iterative Closest Point, ICP, method.
  • This globally updating step may for example be performed upon detection that a formed map segment has coordinates coinciding with or within a predetermined distance from coordinates of the map before being updated with the formed map segment.
  • the step of detecting points or parts in the three dimensional map segment having characteristic features comprises detecting said points in a first image of the image set and detecting said parts or points in the second image of the image set.
  • the step of comparing the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection comprises estimation of a rotation and a translation of a point cloud based on the detected features in the first and second images and the first and second preceding images, so called Procrustes, and based on an assumption that the characteristic features are stationary.
  • the method further comprises a step of projecting light when capturing at least a subset of the images, , wherein the step of forming a three dimensional map segment is based at least partly on the images comprising the projected light, wherein said light may have a characteristic pattern.
  • a light emitting element arranged to generate the projected light When a light emitting element arranged to generate the projected light is mounted on the platform, occurrence of potential shadows in the images is reduced.
  • this information may be used.
  • the light emitting element is not mounted at the platform.
  • the light emitting element may for example be stationary located within the space.
  • the step of forming a three dimensional map segment then may comprise comparison of two 3D point clouds, wherein the clouds are translated/rotated to find a match.
  • a so called Iterative Closest Point, ICP, method may then be used in the formation of the three dimensional map segment.
  • the step of forming a three dimensional map segment may comprises associating calibration information to the respective image of the image set.
  • the association of calibration information to the respective image comprises associating a set of calibration parameters to the respective image.
  • the calibration may be performed before measurements are started.
  • the calibration may be made with regard to intrinsic parameters such as lens distortion, focal length etc. Instead or in addition thereto, the calibration may be made with regard to extrinsic parameters.
  • the calibration can be made before measurements start and accordingly, the same calibration information can be associated to all the images.
  • the step of forming a three dimensional map segment may comprises rectifying the respective image.
  • the two dimensional images can then be rectified based on the calibration
  • Rectifying the images means that the images are transformed so as to be projected to a common image plane. By way of rectification of the two dimensional images, the subsequent step of detection of matching points is facilitated. Any point in the one rectified image will have a matching point in another image along the so called epipolar line in the other images of the image set. This means that search for a matching point is only required along a line in the other image at subsequent stereo image processing. This saves processing resources. Thus, processing speed can be increased.
  • Fig 1 is a flow chart illustrating an example of a method for simultaneous navigation and mapping of a space based on images captured by a moving platform.
  • Fig 2 is a flow chart illustrating an example of a processing step of a method for simultaneous navigation and mapping of a space.
  • Fig 3 is a block scheme illustrating an example of a system for simultaneous navigation and mapping of a space based on images captured by a moving platform.
  • Figure 1 illustrates an example of a method 100 for simultaneous navigation and mapping of a space based on images captured by a moving platform.
  • the moving platform may be a person travelling by foot or in/on a vehicle.
  • the moving platform is a vehicle.
  • the vehicular platform may for example be an aerial vehicle such as an Unmanned Aerial Vehicle, UAV.
  • UAV refers to an engine driven aerial vehicle without a pilot or remote control.
  • the UAV may be formed in any size.
  • the examples given are just examples.
  • the method disclosed herein can be mounted at any movable platform.
  • the method comprises a step of capturing S2 a set of images.
  • the set of images comprises at least two two-dimensional images which are at least partly overlapping.
  • the method further comprises a step of forming S5 a three dimensional map segment based on the overlap part of the set of images.
  • Techniques for forming three dimensional data based on overlapping two-dimensional images are well known in the art and are not described in detail herein.
  • depth perception from stereo vision is based on the triangulation principle.
  • the scene is captured from two different viewpoints for example by capturing the images with two cameras.
  • the points or parts in the respective image can be determined based on any technique known in the art.
  • the points or parts in the respective image can be determined based on feature matching, wherein characterizing features are identified in the respective image and matched.
  • a corresponding point or part is determined in the second image, for example by way of feature matching.
  • the ray intersection point for corresponding points or parts of the first and second images is calculated based on a known camera geometry set up.
  • the geometrical or spatial relation between the cameras capturing the images is known.
  • the first and second cameras may be a left and a right camera.
  • the step of forming the three dimensional map segment may comprise associating calibration information to the respective image of the image set.
  • the geometrical or spatial relation between the cameras may be computed a priori in a stereo calibration process.
  • the stereo calibration process may involve using a calibration object.
  • the calibration object is for example a calibration plate having a visual pattern which is known.
  • respective captured image pair originating from two cameras having a fix geometrical or spatial relation can be captured, showing the pattern at different positions, orientations and distances in both cameras, wherein the first image of the respective image pair originates from the first camera and the second image of the respective image pair originates from the second camera of the two cameras.
  • the known pattern of the calibration plate and the representations of the pattern in the respective image can be used in the calibration process.
  • information related to the representation of the pattern in the respective image can be used to determine the geometrical or spatial relation between the two cameras, i.e. rotation and shift in three dimensions between the first and second camera.
  • This is known as an extrinsic parameter.
  • the extrinsic parameters denote the coordinate system transformations from 3D world coordinates to 3D camera coordinates.
  • the extrinsic parameters define the position of the camera center and the camera ' s heading in world coordinates.
  • the captured images of the calibration object may also be used to determined intrinsic parameters of the respective camera.
  • the intrinsic camera parameters may include parameters such as lens distortion, focal length, etc.
  • the intrinsic and extrinsic parameters together form a stereo camera model.
  • the intrinsic and/or extrinsic parameters can be determined by any other method known in the art.
  • the calibration information associated to the respective image is used to triangulate corresponding points that have been identified in both images and recover their metric coordinates with respect to the camera setup.
  • the formation of a three dimensional map segment characteristically involves processing the captured images associated with their calibration information so as to obtain a three dimensional point cloud or surface representing the map segment.
  • the step of forming the three dimensional map segment may further comprise forming rectification information for the respective image based on the calibration information.
  • Image rectification is a transformation process used to project two or more images onto a common image plane.
  • the rectification information represents the information upon which the transformation process is based.
  • corresponding image points or parts in the first and second images are identified.
  • the entire second image is searched for finding matching points or parts in the second image. This is however generally time consuming. This would characteristically be too time consuming to be done in real time.
  • the geometry of the two cameras allows for restricting the search to a one dimensional line in the second image, the so called epipolar line.
  • the three dimensional map segment can then be formed based on the rectification information.
  • the three dimensional map segment can in accordance therewith be formed based on matching a point in one image of the set of images with another point in another image of the set of the images along the so called epipolar line.
  • the formed three dimensional map segment may be represented by a three dimensional point cloud or surface.
  • the three dimensional point cloud or surface may comprise texture information.
  • the captured image sets may be stored for use in subsequent steps in the method.
  • disparity images obtained based on the respective image sets when forming the three dimensional map segment may be stored.
  • the disparity images may be formed on sub pixel level.
  • depth maps formed based on the disparity images when forming the three dimensional map segment are stored. The disparity images and/or depth maps may be used in in subsequent steps in the method, as will be discussed later.
  • the method comprises further a step of processing S6 the formed three-dimensional map segment.
  • the processing S6 comprises updating a map with the formed three dimensional map segment.
  • the processing further comprises estimating a position of at least one of the cameras.
  • a continuous updating of estimated positions of the platform is provided.
  • the estimated positions are given in relation to a starting or reference positon.
  • the reference position may be the position where measurements started.
  • the reference position and/or estimated positions may be given in a three dimensional coordinate system connected to initiation of measurements.
  • a three dimensional map is built up simultaneously with the continuous providing of the estimated positions of the platform. The coordinates of the three dimensional map are given in relation to the starting or reference position.
  • the processing step will be described in further detail later.
  • the method may comprise a step of projecting SI light within the field of view of the camera(s). The step of projecting SI light is performed at least when the images are captured S2, or when some of the images are captured.
  • the processing S6 step may be arranged to determine whether light projected into the images would significantly improve the results obtained in the processing step and control the activation and deactivation of projecting light into the field of view of the images based thereon.
  • the light is constantly projected into the field of view of the camera(s) capturing the images.
  • the light is manually activated/deactivated.
  • the light may be projected while the step of capturing S2 a set of images is performed.
  • the projected light may be functioning as a flashlight.
  • the wavelength(s) of the projecting light is characteristically within a range which can be registered by the camera(s).
  • the projecting light has in one example a characteristic pattern.
  • the projecting light can then be detected within the captured images.
  • the three dimensional map segment may be formed based on the detected projected light.
  • the forming of the three dimensional map segment may be determined based on the detected characteristic pattern in the images.
  • the subsequent processing S6 step may also be based on the detected light in the respective images.
  • the light source characteristically is mounted to a platform comprising the cameras, the light source follows the movement of the platform. Comparing successive images with regard to characteristic patterns of the light is more complicated and demanding than comparing actual characteristics within a scene, as the projected light characteristically moves between the images while the characteristic features of the scene are static. Therefore, the projected light might even disturb the process of detecting points or parts in the three dimensional map segment having characteristic features.
  • this information can be used to improve the accuracy in forming the three dimensional map segment.
  • some of the images are captured with the projected light and some images are not.
  • the light emitting element is alternating turned on and off.
  • the images having the projected light may then be used as stated above for improving the accuracy in forming the tree dimensional map segment.
  • Those images which do not comprise the projected light may be used in the subsequent processing step of processing the formed three dimensional map segment.
  • any potential disturbance from the projected light is avoided in the process of detecting points or parts in the thee-dimensional map segment having characteristic features.
  • the step of forming a three dimensional map segment is based at least partly on the images comprising the projected light.
  • the projected light has preferably a characteristic pattern or it functions as a flashlight.
  • the light emitting element is not mounted at the platform.
  • the light emitting element may for example be stationary located within the space.
  • the step of forming a three dimensional map segment then may comprise comparison of two 3D point clouds, wherein the clouds are translated/rotated to find a match.
  • a so called Iterative Closest Point, ICP, method may for example then be used in the formation of the three dimensional map segment.
  • the method may further comprise a step of obtaining S3 a position signal for example by means of a GPS receiver.
  • the obtained positon can be used in determining the position of the camera(s) and accordingly the platform.
  • the obtained position signal can be used for forming S5 a three dimensional map segment and/or processing S6 the formed three dimensional map segment.
  • georeferenced positioning is obtained.
  • the method may further comprise a step of obtaining S4 a velocity signal related to the movement of the platform.
  • the velocity signal may for example be obtained by means of an Inertial Measurement Unit. Alternatively or as a complement, the velocity can be determined based on position differences between obtained positions.
  • the forming S5 of a three dimensional map segment and/or processing S6 of the formed three dimensional map segment can be based on the obtained velocity signal.
  • FIG 2 an example of a processing step S6 for processing a formed three-dimensional map segment is illustrated.
  • the processing step S6 is used in a method for simultaneous navigation and mapping of a space.
  • the processing step may be used in methods for simultaneous navigation and mapping of a space as exemplified in relation to Fig 1.
  • the processing step S6 comprises a sub-step S66 of updating a three dimensional map with a formed three dimensional map segment.
  • the three dimensional map may be given in a coordinate system related to a predetermined starting or reference position.
  • the starting or reference position may be a zero coordinate.
  • the starting or reference position may be another suitable coordinate given in a local or global coordinate system.
  • the starting or reference position may be a position as obtained by a receiver of a possibly satellite based global coordinate system, such as GPS.
  • the processing step S6 further comprises sub-steps S61-S64 for estimating a position of the camera(s) based on the formed three dimensional map segment.
  • the sub-steps S61-S64 for estimating a position of the camera(s) comprises detecting S61 points or parts in the three dimensional map segment having characteristic features. In this step, at least some of the images upon which the formation of the three dimensional map segment is based are used.
  • the step of detecting S61 points or parts in the three dimensional map segment having characteristic features comprises detecting said points in a first image of the respective image set and detecting said parts or points in the second image of that image set.
  • the detecting S61 of points or parts in the three dimensional map segment having characteristic features can be made based on the obtained velocity signal.
  • the velocity signal can for example then be used for providing a rough first estimate of the field of view of the respective cameras and based thereon provide a first rough estimate of the positions of the points or parts in the image captured by the respective camera having characterizing features.
  • the searching in the images for detecting S61 points or parts the three dimensional segment having characteristic features can be simplified and demanding less computer power.
  • the detected points or parts in the three dimensional map segment having characteristic features are then compared S62 to at least one preceding point or part detection.
  • the step of comparing S62 the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection may then comprise estimation of a rotation and a translation of a point cloud based on the detected features in the first and second images and the first and second preceding images and based on an assumption that the characteristic features are stationary.
  • Procrustes or Procrustes analysis will not be described herein.
  • Procrustes is a form of statistical shape analysis which can be used to analyse the distribution of a set of shapes. To compare the shapes of two or more objects the objects are translated, rotated and potentially uniformly scaled. Thus the placement in space and size of the objects are freely adjusted. The aim is to obtain a similar placement and space, by minimizing a measure of shape difference called the Procrustes distance between the objects.
  • the Procrustes distance can be determined based on translation, rotation and scaling.
  • the Procrustes distance can be determined based on translation and rotation.
  • a movement of the camera(s) is estimated S63 based on the comparison.
  • the estimated movement can for example be represented as a vector.
  • the estimated movement comprises an estimated direction and an estimated distance.
  • the position of the camera(s) is then estimated S63 based on the estimated movement of the camera(s) and based on at least one preceding estimate of the position of the camera(s). If a position signal is obtained for example by means of a GPS receiver, this obtained positon may also be used in determining the position of the camera(s). Thereby, drifting in the estimation of the position of the camera(s) can be avoided.
  • the processing step may further comprise a step of obtaining S65 an improved estimated position estimate.
  • the estimated position is used for selecting a map part of the updated three dimensional map.
  • the improved position estimate is then determined based on the estimated position and the selected map part.
  • the representation of the formed map part such as a surface or point cloud
  • the position, disparity image(s) and/or depth map(s) related to the estimated current position of the camera(s) or the selected map part can be used for identifying a current position of the camera(s) as fit to the representation of the formed map part, the disparity images and/or depth maps.
  • the improved estimated position estimate can be determined based on the position as identified based on the representation of the formed map part, and the disparity images and/or depth maps. This supports in securing that drifting in the estimation of the position of the camera(s) is avoided.
  • the method may comprise a step of globally updating S67 the map.
  • detected points or parts of the map are summed up or weighted with previously made detections.
  • the step of globally updating S67 the map comprises the steps of accessing the latest detected points or parts in the three dimensional map segment having characteristic features, to search the map to find a match with other previously detected points or parts having said accessed characteristic features and to globally update the map to sum up or weight coordinates of the accessed detected points or parts having the characteristic with coordinates of corresponding found matching detected points or parts having the
  • the global update of the match can be made using for example am Iterative Closest Point, ICP, method.
  • This globally updating step may for example be performed upon detection that a formed map segment has coordinates coinciding with or within a predetermined distance from coordinates of the map before being updated with the formed map segment.
  • the methods for simultaneous navigation and mapping of a space based on images captured by a moving platform and/or processing step S6 for processing a formed three-dimensional map segment as disclosed herein may be implemented in software.
  • a system 300 for simultaneous navigation and mapping of a space based on images captured by a moving platform 301 is disclosed.
  • the system may be adapted for simultaneous short range navigation and mapping such as in secluded spaces.
  • the system may for example be used for mapping inside buildings.
  • the system 300 comprises image capturing means 310 arranged to capture a set of images comprising at least two two-dimensional images, said set of images being at least partly overlapping.
  • the image capturing means 310 are in one example integrated with the moving platform 301.
  • the image capturing means 310 are in one example mounted to the moving platform 301.
  • the image capturing means 310 comprises a first image capturing element 311 and a second image capturing element 312.
  • the first and second image capturing elements 311, 312 are further arranged in relation to each other such that their field of views are at least partly overlapping.
  • the first and second image capturing elements 311, 312 are arranged at a distance from each other. Thereby the respective image capturing elements are viewing a scene from different angles.
  • the image capturing means are in one example arranged capture images for short range image three dimensional image generations. The distance between the image capturing elements is in one example below one meter.
  • the image capturing elements 311, 312 are cameras arranged to obtain images within a given wavelength range.
  • the image capturing elements 311, 312 may be arranged to capture images within the visual wavelength range.
  • the image capturing elements 311, 312 may instead be arranged to capture thermal images.
  • the image capturing elements 311, 312 may be arranged to capture images within the infrared wavelength range.
  • the image capturing element 311, 312 may be arranged to capture images in any wavelength range.
  • the image capturing elements 311, 312 may be arranged to capture radar images.
  • the image capturing elements 311, 312 are arranged to capture images within a plurality of wavelength ranges.
  • the image capturing means 310 may comprise a plurality of image capturing element pairs.
  • the different image capturing element pairs may be arranged to capture images within different wavelength ranges.
  • the image capturing means 310 comprises as discussed above one or a plurality of image capturing element pairs. However, the image capturing means may comprise image capturing element sets comprising more than two image capturing elements.
  • the system 300 further comprises a processing element 320.
  • the processing element is arranged to receive the images captured by the image capturing means.
  • the processing element is arranged to continuously obtain the position of the image capturing means 310 and to update a map based on the received images.
  • the processing element 320 is in the illustrated example arranged at the moving platform 301. Alternatively, at least parts of the processing element 320 are located at a distance from the moving platform. Data related to the captured images may then be transmitted to the remotely arranged processing element via a transmitter 330 arranged at the platform 301 and adapted to transmit data.
  • the system 300 for simultaneous navigation and mapping of a space comprises further one or a plurality of memory elements 330.
  • the memory element(s) 330 may be arranged to store software for performing simultaneous navigation and mapping of a space based on images captured by a moving platform and/or a processing step S6 for processing said formed three-dimensional map segment of a method for simultaneous navigation and mapping of a space.
  • the memory element(s) may further be arranged to store the formed three dimensional map.
  • the memory element(s) 330 may be arranged to store at least some of the captured images.
  • the memory element(s) may further be arranged to store processed images, such as disparity images and/or depth maps.
  • the memory element(s) may further be arranged to store other data such as detected points or parts in formed three dimensional map segments, wherein the detected points or parts have characteristic features.
  • the entire or parts of the memory element(s) may be formed at the platform 301.
  • the memory element(s) may be placed at the location of the processing element 320.
  • the platform may comprise memory parts arranged to store images captured by the image capturing means and/or disparity images and/or depth maps. The stored images can in addition thereto be transmitted to remote memory parts at the remote processing element.
  • the image processing element 320 is arranged to form a three dimensional map segment based on the overlap part of the set of images, and to process the formed three- dimensional map segment.
  • the processing of the formed three-dimensional map segment comprises detecting points or parts in the three dimensional map segment having
  • the processing element 320 is arranged to, for each processing of the formed three dimensional map segment, compare the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection. The processing element 320 is further arranged to estimate the movement of the image processing elements based on the comparison. The processing element 320 is then arranged to estimate the position of the image capturing elements based on the estimated movement of said image capturing element(s).
  • the system 300 may further comprise a light emitting element 340.
  • the light emitting element 340 is in the illustrated example arranged at or connected to the moving platform.
  • the light emitting element 340 is arranged to project light within the field of view of the camera(s). The light may be projected while capturing a set of images. Use of projected light was also discussed in relation to figure 1.
  • the wavelength(s) of the projecting light is characteristically within a range which can be registered by the camera(s).
  • the light emitting element may function as a flashlight.
  • the light emitting element is in one example arranged to emit light having characteristic features. This generally means that the light has a characteristic pattern.
  • the emitted light can then be detected within the captured images.
  • the three dimensional map segment may be formed based on the detected projected light. The forming of the three dimensional map segment may be determined based on the detected characteristic pattern in the images.
  • Other processing by the processing element 320 may be performed based on images without light projected into them, as also has been discussed above.
  • a position obtaining element 360 is arranged at the moving platform 301.
  • the position obtaining element may provide georeferenced coordinate data for geo-referencing the map.
  • the position obtaining element comprises a GPS receiver.
  • a velocity obtaining element 370 is arranged at the moving platform 301.
  • the velocity obtaining element 370 may provide velocity data for use in estimating the movement of the image capturing element(s). User of velocity data can support in the calculations such that the calculation burden is reduced.
  • the velocity obtaining element 370 may for example comprise an Inertial Measurement Unit, IMU.
  • the platform 301 comprises further a receiver 390 for receiving data.
  • the data transmitter 380 and data receiver 390 may be used for communicating with other moving platforms.
  • the respective platform may be arranged to communicate the built three dimensional map and/or its estimated position to the other platforms.
  • the processing element can be arranged to update the three dimensional map based on the three dimensional map received from another platform.
  • the updating may comprise adapting the received three dimensional map to a coordinate system as used by the platform.
  • the processing element may be arranged to obtain information related to translation of coordinates as received from the other platform to a coordinate system as used by the own platform.
  • the own platform can be arranged to translate estimated positions received from that other platforms to position coordinates in the own coordinate system.

Abstract

The present disclosure relates to a method and system for simultaneous navigation and mapping of a space based on images captured by a moving platform. The method comprises capturing (S2) a set of images comprising at least two two-dimensional images, said set of images being at least partly overlapping, forming(S5) a three dimensional map segment based on the overlap part of the set of images,and processing (S6) the formed three-dimensional map segment. The processing comprises detecting points or parts in the three dimensional map segment having characteristic features, and updating (S66) a map with the formed map segment. For each processing of the formed three dimensional map segment, the detected points or parts in the three dimensional map segment having characteristic features are compared to at least one preceding point or part detection, a movement of the platform is estimated based on the comparison, and the position of the platform is estimated based on the estimated movement of the platform.

Description

A method and system for simultaneous navigation and mapping of a space TECHNICAL FIELD
The present disclosure relates to a method and system for simultaneous navigation and mapping of a space based on images captured by a moving platform.
BACKGROUND ART
In robotic mapping, simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of the location the robot within it. While this initially appears to be a chicken- and-egg problem there are several algorithms known for solving it, at least approximately, for certain conditions. Approximate solution methods may include the particle filter and extended Kalman filter.
SLAM algorithms are tailored to the available resources, hence not aimed at perfection, but at operational compliance. Published approaches are employed in self-driving cars, unmanned aerial vehicles, autonomous underwater vehicles, planetary rovers, newly emerging domestic robots and even inside the human body.
Statistical techniques used to approximate the solutions include Kalman filters, particle filters (Monte Carlo methods) and scan matching of range data. These approximate solutions provide an estimation of the posterior probability function for the pose of the robot and for the parameters of the map. Set-membership techniques are mainly based on interval constraint propagation. The known techniques provide a set which encloses the pose of the robot and a set approximation of the map.
Bundle adjustment is another known technique for SLAM using image data. Bundle adjustment jointly estimates poses and landmark positions, increasing map fidelity.
SLAM algorithms may be driven by differing requirements and assumptions about the types of maps, sensors and models.
US2015/0304634 describes SLAM methods and a system for real time tracking of features. SUMMARY
An object of the disclosure is to obtain an improved method for simultaneous navigation and mapping of a space. The method is based on use of stereo camera pairs and stereo image processing. The method may for example be implemented at an Unmanned Aerial Vehicle, UAV. A UAV refers to an engine driven aerial vehicle without a pilot or remote control. The UAV may be formed in any size. An UAV is however only an example. The method disclosed herein can be mounted at any movable object. The space may be under water. The space may be above ground
Different embodiments of the present disclosure relate to a method for simultaneous navigation and mapping of a space based on images captured by a moving platform. The method comprises capturing a set of images comprising at least two two-dimensional images, said set of images being at least partly overlapping, forming a three dimensional map segment based on the overlap part of the set of images, and processing the formed three-dimensional map segment. The processing of the formed three-dimensional map segment comprises detecting points or parts in the three dimensional map segment having characteristic features, and updating a map with the formed map segment. For each processing of the formed three dimensional map segment, the detected points or parts in the three dimensional map segment having characteristic features are compared to at least one preceding point or part detection, a movement of the platform is estimated based on the comparison, and the position of the platform is estimated based on the estimated movement of the platform.
In accordance with the method, a tree dimensional mapping of the space is built up during use of a camera arrangement arranged to capture overlapping image pairs so as to obtain depth data related to the captured images. The camera arrangement may comprise at least two cameras arranged at a distance from each other. The building of the three dimensional map may be performed in real time.
Thus, the method provides continuous updating of estimated positions. Each position is given in relation to a starting or reference position. The reference position may be the position where measurements started. The position may be given in a three dimensional coordinate system. Then, the reference position may be given a geographical coordinate determined when initiating measurements. The geographical coordinate may be given in a global coordinate system. The geographical coordinate may for example be obtained from a GPS receiver.
The method is typically very well suited for building three dimensional maps in secluded spaces such as the inside of buildings etc. The method is very well suited for building three dimensional maps where the distance to the mapped objects are tenths of meters or fewer.
The method is very well suited form mapping static environments, wherein images captured mainly comprises static objects. If moving objects are in the captured images, these may be filtered out and/or disregarded.
Further, built-up three dimensional maps may be exchanged between platforms. Accordingly, each platform can build three dimensional maps based information received also from other platforms.
In different embodiments, the method further comprises a step of obtaining an improved position, wherein the estimated position is used for selecting a map part whereupon the improved estimated position is determined based on the estimated position and an identified corresponding position in the selected map part. This supports in securing that drifting in the estimation of the position of the platform is avoided.
In different embodiments, the method further comprises a step of globally updating the map. In this step, detected points or parts of the map are summed up or weighted with previously made detections. The step of globally updating the map comprises the steps of accessing the latest detected points or parts in the three dimensional map segment having characteristic features, to search the map to find a match with other previously detected points or parts having said accessed characteristic features and to globally update the map to sum up or weight coordinates of the accessed detected points or parts having the characteristic features with coordinates of corresponding found matching detected points or parts having the characteristic features.
The global update of the match can be made using for example an Iterative Closest Point, ICP, method. This globally updating step may for example be performed upon detection that a formed map segment has coordinates coinciding with or within a predetermined distance from coordinates of the map before being updated with the formed map segment. In different embodiments, the step of detecting points or parts in the three dimensional map segment having characteristic features comprises detecting said points in a first image of the image set and detecting said parts or points in the second image of the image set.
In different embodiments, the step of comparing the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection comprises estimation of a rotation and a translation of a point cloud based on the detected features in the first and second images and the first and second preceding images, so called Procrustes, and based on an assumption that the characteristic features are stationary. In different embodiments the method further comprises a step of projecting light when capturing at least a subset of the images, , wherein the step of forming a three dimensional map segment is based at least partly on the images comprising the projected light, wherein said light may have a characteristic pattern.
When a light emitting element arranged to generate the projected light is mounted on the platform, occurrence of potential shadows in the images is reduced. When the light emitting element is mounted at the platform in a known relation to the platform, this information may be used. Alternatively, the light emitting element is not mounted at the platform. The light emitting element may for example be stationary located within the space.
The step of forming a three dimensional map segment then may comprise comparison of two 3D point clouds, wherein the clouds are translated/rotated to find a match. A so called Iterative Closest Point, ICP, method may then be used in the formation of the three dimensional map segment.
The step of forming a three dimensional map segment may comprises associating calibration information to the respective image of the image set. The association of calibration information to the respective image comprises associating a set of calibration parameters to the respective image. The calibration may be performed before measurements are started. The calibration may be made with regard to intrinsic parameters such as lens distortion, focal length etc. Instead or in addition thereto, the calibration may be made with regard to extrinsic parameters. The calibration can be made before measurements start and accordingly, the same calibration information can be associated to all the images.
The step of forming a three dimensional map segment may comprises rectifying the respective image. The two dimensional images can then be rectified based on the calibration
information. Rectifying the images means that the images are transformed so as to be projected to a common image plane. By way of rectification of the two dimensional images, the subsequent step of detection of matching points is facilitated. Any point in the one rectified image will have a matching point in another image along the so called epipolar line in the other images of the image set. This means that search for a matching point is only required along a line in the other image at subsequent stereo image processing. This saves processing resources. Thus, processing speed can be increased.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig 1 is a flow chart illustrating an example of a method for simultaneous navigation and mapping of a space based on images captured by a moving platform.
Fig 2 is a flow chart illustrating an example of a processing step of a method for simultaneous navigation and mapping of a space.
Fig 3 is a block scheme illustrating an example of a system for simultaneous navigation and mapping of a space based on images captured by a moving platform.
DETAILED DESCRIPTION
Figure 1 illustrates an example of a method 100 for simultaneous navigation and mapping of a space based on images captured by a moving platform. For example, the moving platform may be a person travelling by foot or in/on a vehicle. Alternatively, the moving platform is a vehicle. The vehicular platform may for example be an aerial vehicle such as an Unmanned Aerial Vehicle, UAV. A UAV refers to an engine driven aerial vehicle without a pilot or remote control. The UAV may be formed in any size. The examples given are just examples. The method disclosed herein can be mounted at any movable platform. The method comprises a step of capturing S2 a set of images. The set of images comprises at least two two-dimensional images which are at least partly overlapping.
The method further comprises a step of forming S5 a three dimensional map segment based on the overlap part of the set of images. Techniques for forming three dimensional data based on overlapping two-dimensional images are well known in the art and are not described in detail herein.
Generally described, depth perception from stereo vision is based on the triangulation principle. The scene is captured from two different viewpoints for example by capturing the images with two cameras. For each point or part visible in both images, there are two rays in three dimensional space connecting the point or part with each camera's centre of projection. The points or parts in the respective image can be determined based on any technique known in the art. For example, the points or parts in the respective image can be determined based on feature matching, wherein characterizing features are identified in the respective image and matched. In order to obtain depth information for each point or part of the first image in an overlapping area, a corresponding point or part is determined in the second image, for example by way of feature matching. Then, the ray intersection point for corresponding points or parts of the first and second images is calculated based on a known camera geometry set up. Thus, the geometrical or spatial relation between the cameras capturing the images is known. The first and second cameras may be a left and a right camera.
The step of forming the three dimensional map segment may comprise associating calibration information to the respective image of the image set.
The geometrical or spatial relation between the cameras may be computed a priori in a stereo calibration process. The stereo calibration process may involve using a calibration object. The calibration object is for example a calibration plate having a visual pattern which is known. Then, respective captured image pair originating from two cameras having a fix geometrical or spatial relation can be captured, showing the pattern at different positions, orientations and distances in both cameras, wherein the first image of the respective image pair originates from the first camera and the second image of the respective image pair originates from the second camera of the two cameras. The known pattern of the calibration plate and the representations of the pattern in the respective image can be used in the calibration process. For example, information related to the representation of the pattern in the respective image can be used to determine the geometrical or spatial relation between the two cameras, i.e. rotation and shift in three dimensions between the first and second camera. This is known as an extrinsic parameter. Thus, the extrinsic parameters denote the coordinate system transformations from 3D world coordinates to 3D camera coordinates. The extrinsic parameters define the position of the camera center and the camera's heading in world coordinates. Further, the captured images of the calibration object may also be used to determined intrinsic parameters of the respective camera. The intrinsic camera parameters may include parameters such as lens distortion, focal length, etc. The intrinsic and extrinsic parameters together form a stereo camera model. The intrinsic and/or extrinsic parameters can be determined by any other method known in the art.
The calibration information associated to the respective image is used to triangulate corresponding points that have been identified in both images and recover their metric coordinates with respect to the camera setup. In practice, the formation of a three dimensional map segment characteristically involves processing the captured images associated with their calibration information so as to obtain a three dimensional point cloud or surface representing the map segment.
The step of forming the three dimensional map segment may further comprise forming rectification information for the respective image based on the calibration information. Image rectification is a transformation process used to project two or more images onto a common image plane. The rectification information represents the information upon which the transformation process is based.
As stated above, in order to perform the triangulation, corresponding image points or parts in the first and second images are identified. In one example, the entire second image is searched for finding matching points or parts in the second image. This is however generally time consuming. This would characteristically be too time consuming to be done in real time. The geometry of the two cameras allows for restricting the search to a one dimensional line in the second image, the so called epipolar line.
Thus, the three dimensional map segment can then be formed based on the rectification information. The three dimensional map segment can in accordance therewith be formed based on matching a point in one image of the set of images with another point in another image of the set of the images along the so called epipolar line.
As discussed above, the formed three dimensional map segment may be represented by a three dimensional point cloud or surface. The three dimensional point cloud or surface may comprise texture information. Further, the captured image sets may be stored for use in subsequent steps in the method.
Further, for example disparity images obtained based on the respective image sets when forming the three dimensional map segment may be stored. The disparity images may be formed on sub pixel level. In one example depth maps formed based on the disparity images when forming the three dimensional map segment are stored. The disparity images and/or depth maps may be used in in subsequent steps in the method, as will be discussed later.
The method comprises further a step of processing S6 the formed three-dimensional map segment. The processing S6 comprises updating a map with the formed three dimensional map segment. The processing further comprises estimating a position of at least one of the cameras.
Thus, in accordance with this method a continuous updating of estimated positions of the platform is provided. The estimated positions are given in relation to a starting or reference positon. The reference position may be the position where measurements started. The reference position and/or estimated positions may be given in a three dimensional coordinate system connected to initiation of measurements. Further, in accordance with the method, a three dimensional map is built up simultaneously with the continuous providing of the estimated positions of the platform. The coordinates of the three dimensional map are given in relation to the starting or reference position. The processing step will be described in further detail later. The method may comprise a step of projecting SI light within the field of view of the camera(s). The step of projecting SI light is performed at least when the images are captured S2, or when some of the images are captured. The processing S6 step may be arranged to determine whether light projected into the images would significantly improve the results obtained in the processing step and control the activation and deactivation of projecting light into the field of view of the images based thereon. Alternatively, the light is constantly projected into the field of view of the camera(s) capturing the images. Alternatively, the light is manually activated/deactivated.
Thus, the light may be projected while the step of capturing S2 a set of images is performed. The projected light may be functioning as a flashlight.
The wavelength(s) of the projecting light is characteristically within a range which can be registered by the camera(s).
The projecting light has in one example a characteristic pattern. The projecting light can then be detected within the captured images. The three dimensional map segment may be formed based on the detected projected light. The forming of the three dimensional map segment may be determined based on the detected characteristic pattern in the images. Thereby, it is possible to build a three dimensional map segment and simultaneously navigate in spaces with very few characteristic features within the scene. For example, the method for navigating and building a three dimensional map may otherwise not be possible in scenes with having plain walls without two dimensional and/or three dimensional characteristic features.
Further, the subsequent processing S6 step may also be based on the detected light in the respective images. However, as the light source characteristically is mounted to a platform comprising the cameras, the light source follows the movement of the platform. Comparing successive images with regard to characteristic patterns of the light is more complicated and demanding than comparing actual characteristics within a scene, as the projected light characteristically moves between the images while the characteristic features of the scene are static. Therefore, the projected light might even disturb the process of detecting points or parts in the three dimensional map segment having characteristic features.
In accordance with different examples, when the projected light is detected in the images, this information can be used to improve the accuracy in forming the three dimensional map segment. In one example, some of the images are captured with the projected light and some images are not. Thus, in accordance with this example, the light emitting element is alternating turned on and off. The images having the projected light may then be used as stated above for improving the accuracy in forming the tree dimensional map segment. Those images which do not comprise the projected light may be used in the subsequent processing step of processing the formed three dimensional map segment. Then, any potential disturbance from the projected light is avoided in the process of detecting points or parts in the thee-dimensional map segment having characteristic features. Thus, the step of forming a three dimensional map segment is based at least partly on the images comprising the projected light. The projected light has preferably a characteristic pattern or it functions as a flashlight.
When the light emitting element is mounted on the platform, occurance of potential shadows in the images is reduced. When the light emitting element is mounted at the platform in a known relation to the platform, this information may be used.
Alternatively, the light emitting element is not mounted at the platform. The light emitting element may for example be stationary located within the space. The step of forming a three dimensional map segment then may comprise comparison of two 3D point clouds, wherein the clouds are translated/rotated to find a match. A so called Iterative Closest Point, ICP, method may for example then be used in the formation of the three dimensional map segment.
The method may further comprise a step of obtaining S3 a position signal for example by means of a GPS receiver. The obtained positon can be used in determining the position of the camera(s) and accordingly the platform. Thus, the obtained position signal can be used for forming S5 a three dimensional map segment and/or processing S6 the formed three dimensional map segment. In providing a position signal by means of a GPS receiver, georeferenced positioning is obtained.
The method may further comprise a step of obtaining S4 a velocity signal related to the movement of the platform. The velocity signal may for example be obtained by means of an Inertial Measurement Unit. Alternatively or as a complement, the velocity can be determined based on position differences between obtained positions. The forming S5 of a three dimensional map segment and/or processing S6 of the formed three dimensional map segment can be based on the obtained velocity signal. In figure 2, an example of a processing step S6 for processing a formed three-dimensional map segment is illustrated. The processing step S6 is used in a method for simultaneous navigation and mapping of a space. The processing step may be used in methods for simultaneous navigation and mapping of a space as exemplified in relation to Fig 1. The processing step S6 comprises a sub-step S66 of updating a three dimensional map with a formed three dimensional map segment. The three dimensional map may be given in a coordinate system related to a predetermined starting or reference position. The starting or reference position may be a zero coordinate. The starting or reference position may be another suitable coordinate given in a local or global coordinate system. For example the starting or reference position may be a position as obtained by a receiver of a possibly satellite based global coordinate system, such as GPS.
The processing step S6 further comprises sub-steps S61-S64 for estimating a position of the camera(s) based on the formed three dimensional map segment.
The sub-steps S61-S64 for estimating a position of the camera(s) comprises detecting S61 points or parts in the three dimensional map segment having characteristic features. In this step, at least some of the images upon which the formation of the three dimensional map segment is based are used. The step of detecting S61 points or parts in the three dimensional map segment having characteristic features comprises detecting said points in a first image of the respective image set and detecting said parts or points in the second image of that image set.
When a velocity signal has been obtained for example by means of an Inertial Measurement Unit, the detecting S61 of points or parts in the three dimensional map segment having characteristic features can be made based on the obtained velocity signal. The velocity signal can for example then be used for providing a rough first estimate of the field of view of the respective cameras and based thereon provide a first rough estimate of the positions of the points or parts in the image captured by the respective camera having characterizing features. The searching in the images for detecting S61 points or parts the three dimensional segment having characteristic features can be simplified and demanding less computer power. The detected points or parts in the three dimensional map segment having characteristic features are then compared S62 to at least one preceding point or part detection.
The step of comparing S62 the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection may then comprise estimation of a rotation and a translation of a point cloud based on the detected features in the first and second images and the first and second preceding images and based on an assumption that the characteristic features are stationary. Such process is known in the art and goes generally under the term Procrustes or Procrustes analysis. Procrustes or Procrustes analysis will not be described herein. However, generally, Procrustes is a form of statistical shape analysis which can be used to analyse the distribution of a set of shapes. To compare the shapes of two or more objects the objects are translated, rotated and potentially uniformly scaled. Thus the placement in space and size of the objects are freely adjusted. The aim is to obtain a similar placement and space, by minimizing a measure of shape difference called the Procrustes distance between the objects. As is understood from the above, the Procrustes distance can be determined based on translation, rotation and scaling.
Alternatively, the Procrustes distance can be determined based on translation and rotation.
Further, a movement of the camera(s) is estimated S63 based on the comparison. The estimated movement can for example be represented as a vector. The estimated movement comprises an estimated direction and an estimated distance.
The position of the camera(s) is then estimated S63 based on the estimated movement of the camera(s) and based on at least one preceding estimate of the position of the camera(s). If a position signal is obtained for example by means of a GPS receiver, this obtained positon may also be used in determining the position of the camera(s). Thereby, drifting in the estimation of the position of the camera(s) can be avoided.
The processing step may further comprise a step of obtaining S65 an improved estimated position estimate. In this step, the estimated position is used for selecting a map part of the updated three dimensional map. The improved position estimate is then determined based on the estimated position and the selected map part. In detail, the representation of the formed map part, such as a surface or point cloud, and the position, disparity image(s) and/or depth map(s) related to the estimated current position of the camera(s) or the selected map part can be used for identifying a current position of the camera(s) as fit to the representation of the formed map part, the disparity images and/or depth maps. Then the improved estimated position estimate can be determined based on the position as identified based on the representation of the formed map part, and the disparity images and/or depth maps. This supports in securing that drifting in the estimation of the position of the camera(s) is avoided.
Further, the method may comprise a step of globally updating S67 the map. In this step, detected points or parts of the map are summed up or weighted with previously made detections. The step of globally updating S67 the map comprises the steps of accessing the latest detected points or parts in the three dimensional map segment having characteristic features, to search the map to find a match with other previously detected points or parts having said accessed characteristic features and to globally update the map to sum up or weight coordinates of the accessed detected points or parts having the characteristic with coordinates of corresponding found matching detected points or parts having the
characteristic features. The global update of the match can be made using for example am Iterative Closest Point, ICP, method. This globally updating step may for example be performed upon detection that a formed map segment has coordinates coinciding with or within a predetermined distance from coordinates of the map before being updated with the formed map segment.
The methods for simultaneous navigation and mapping of a space based on images captured by a moving platform and/or processing step S6 for processing a formed three-dimensional map segment as disclosed herein may be implemented in software.
In figure 3, a system 300 for simultaneous navigation and mapping of a space based on images captured by a moving platform 301 is disclosed. The system may be adapted for simultaneous short range navigation and mapping such as in secluded spaces. The system may for example be used for mapping inside buildings.
The system 300 comprises image capturing means 310 arranged to capture a set of images comprising at least two two-dimensional images, said set of images being at least partly overlapping. The image capturing means 310 are in one example integrated with the moving platform 301. The image capturing means 310 are in one example mounted to the moving platform 301. In the illustrated example, the image capturing means 310 comprises a first image capturing element 311 and a second image capturing element 312. The first and second image capturing elements 311, 312 are further arranged in relation to each other such that their field of views are at least partly overlapping. The first and second image capturing elements 311, 312 are arranged at a distance from each other. Thereby the respective image capturing elements are viewing a scene from different angles. The image capturing means are in one example arranged capture images for short range image three dimensional image generations. The distance between the image capturing elements is in one example below one meter.
In one example the image capturing elements 311, 312 are cameras arranged to obtain images within a given wavelength range. The image capturing elements 311, 312 may be arranged to capture images within the visual wavelength range. The image capturing elements 311, 312 may instead be arranged to capture thermal images. Accordingly, the image capturing elements 311, 312 may be arranged to capture images within the infrared wavelength range. The image capturing element 311, 312 may be arranged to capture images in any wavelength range. For instance the image capturing elements 311, 312 may be arranged to capture radar images. In different examples, the image capturing elements 311, 312 are arranged to capture images within a plurality of wavelength ranges.
The image capturing means 310 may comprise a plurality of image capturing element pairs. The different image capturing element pairs may be arranged to capture images within different wavelength ranges.
The image capturing means 310 comprises as discussed above one or a plurality of image capturing element pairs. However, the image capturing means may comprise image capturing element sets comprising more than two image capturing elements.
The system 300 further comprises a processing element 320. The processing element is arranged to receive the images captured by the image capturing means. The processing element is arranged to continuously obtain the position of the image capturing means 310 and to update a map based on the received images. The processing element 320 is in the illustrated example arranged at the moving platform 301. Alternatively, at least parts of the processing element 320 are located at a distance from the moving platform. Data related to the captured images may then be transmitted to the remotely arranged processing element via a transmitter 330 arranged at the platform 301 and adapted to transmit data.
In the illustrated example, the system 300 for simultaneous navigation and mapping of a space comprises further one or a plurality of memory elements 330. The memory element(s) 330 may be arranged to store software for performing simultaneous navigation and mapping of a space based on images captured by a moving platform and/or a processing step S6 for processing said formed three-dimensional map segment of a method for simultaneous navigation and mapping of a space. The memory element(s) may further be arranged to store the formed three dimensional map. The memory element(s) 330 may be arranged to store at least some of the captured images. The memory element(s) may further be arranged to store processed images, such as disparity images and/or depth maps. The memory element(s) may further be arranged to store other data such as detected points or parts in formed three dimensional map segments, wherein the detected points or parts have characteristic features. The entire or parts of the memory element(s) may be formed at the platform 301. The memory element(s) may be placed at the location of the processing element 320. The platform may comprise memory parts arranged to store images captured by the image capturing means and/or disparity images and/or depth maps. The stored images can in addition thereto be transmitted to remote memory parts at the remote processing element.
In detail, the image processing element 320 is arranged to form a three dimensional map segment based on the overlap part of the set of images, and to process the formed three- dimensional map segment. The processing of the formed three-dimensional map segment comprises detecting points or parts in the three dimensional map segment having
characteristic features, and updating a map with the formed map segment. The processing element 320 is arranged to, for each processing of the formed three dimensional map segment, compare the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection. The processing element 320 is further arranged to estimate the movement of the image processing elements based on the comparison. The processing element 320 is then arranged to estimate the position of the image capturing elements based on the estimated movement of said image capturing element(s). The system 300 may further comprise a light emitting element 340. The light emitting element 340 is in the illustrated example arranged at or connected to the moving platform. The light emitting element 340 is arranged to project light within the field of view of the camera(s). The light may be projected while capturing a set of images. Use of projected light was also discussed in relation to figure 1.
The wavelength(s) of the projecting light is characteristically within a range which can be registered by the camera(s).The light emitting element may function as a flashlight. The light emitting element is in one example arranged to emit light having characteristic features. This generally means that the light has a characteristic pattern. The emitted light can then be detected within the captured images. The three dimensional map segment may be formed based on the detected projected light. The forming of the three dimensional map segment may be determined based on the detected characteristic pattern in the images. Other processing by the processing element 320 may be performed based on images without light projected into them, as also has been discussed above.
In the illustrated example, a position obtaining element 360 is arranged at the moving platform 301. The position obtaining element may provide georeferenced coordinate data for geo-referencing the map. In one example, the position obtaining element comprises a GPS receiver. In the illustrated example, a velocity obtaining element 370 is arranged at the moving platform 301. The velocity obtaining element 370 may provide velocity data for use in estimating the movement of the image capturing element(s). User of velocity data can support in the calculations such that the calculation burden is reduced. The velocity obtaining element 370 may for example comprise an Inertial Measurement Unit, IMU. In the illustrated example, the platform 301 comprises further a receiver 390 for receiving data. The data transmitter 380 and data receiver 390 may be used for communicating with other moving platforms. The respective platform may be arranged to communicate the built three dimensional map and/or its estimated position to the other platforms. When receiving a three dimensional map as built by another platform the processing element can be arranged to update the three dimensional map based on the three dimensional map received from another platform. The updating may comprise adapting the received three dimensional map to a coordinate system as used by the platform. In doing that, the processing element may be arranged to obtain information related to translation of coordinates as received from the other platform to a coordinate system as used by the own platform. When this translation of coordinates has been done related to another platform, the own platform can be arranged to translate estimated positions received from that other platforms to position coordinates in the own coordinate system.

Claims

1. A method for simultaneous navigation and mapping of a space based on images
captured by a moving platform, said method comprising capturing (S2) a set of images comprising at least two two-dimensional images, said set of images being at least partly overlapping, forming (S5) a three dimensional map segment based on the overlap part of the set of images, and processing (S6) the formed three-dimensional map segment, said processing comprising
detecting (S61) points or parts in the three dimensional map segment having characteristic features, and updating (S66) a map with the formed map segment, wherein for each processing (S6) of the formed three dimensional map segment, the detected points or parts in the three dimensional map segment having characteristic features are compared (S62) to at least one preceding point or part detection and a movement of the platform is estimated (S63) based on the comparison, and the position of the platform is estimated (S64) based on the estimated movement of the platform.
2. The method according to claim 1, further comprising a step of obtaining (S65) an
improved position, wherein the estimated position is used for selecting a map part whereupon the improved estimated position is determined based on the estimated position and an identified corresponding position in the selected map part.
3. The method according to any of the preceding claims, further comprising a step of globally updating the map, said step comprising accessing the latest detected points or parts in the three dimensional map segment having characteristic features, searching the map to find a match with other previously detected points or parts having said accessed characteristic features, and globally updating the map to sum up or weight coordinates of the accessed detected points or parts having the characteristic features with coordinates of at least one corresponding found matching detected point or part having the characteristic features.
4. The method according to any of the preceding claims, wherein the step of detecting (S61) points or parts in the three dimensional map segment having characteristic features comprises detecting said points in a first image of the image set and detecting said parts or points in the second image of the image set.
5. The method according to claim 4, wherein the step of comparing (S62) the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection comprises estimation of a rotation and a translation of a point cloud based on the detected features in the first and second images and the first and second preceding images and based on an assumption that the characteristic features are stationary.
6. The method according to any of the claims 1 - 4, further comprising a step of projecting light (SI) when capturing at least a subset of the images, wherein the step of forming (S5) a three dimensional map segment is based on the images comprising the projected light, based on the overlap part of the set of images, wherein said light may have a characteristic pattern.
7. The method according to any of the preceding claims, wherein the step of forming (S5) the three dimensional map segment comprises associating calibration information to the respective image of the image set and forming rectification information for the respective image based on the calibration information.
8. The method according to claim 7, wherein the three dimensional map segment is
formed based on the rectification information.
9. The method according to claim 8, wherein the three dimensional map segment is
formed based on matching a point in one image of the set of images with another point in another image of the set of the images along the so called epipolar line.
10. The method according to any of the preceding claims, further comprising a step of
obtaining (S4) a velocity signal for example by means of an Inertial Measurement Unit and to support the detecting (3a) of points or parts in the three dimensional map segment having characteristic features based on the obtained velocity signal.
11. The method according to any of the preceding claims, further comprising a step
obtaining (S3) a position signal for example by means of a GPS receiver, and to use the obtained positon in determining the position of the platform.
12. The method according to any of the preceding claims, wherein sets of images comprising at least two two-dimensional images, said set of images being at least partly
overlapping, are captured.
13. Software for performing simultaneous navigation and mapping of a space, said software being adapted to perform the method according to any of the claims 1 - 12.
14. A system (300) for simultaneous navigation and mapping of a space based on images captured by a moving platform (301), said system comprising image capturing means (310) arranged to capture a set of images comprising at least two two-dimensional images, said set of images being at least partly overlapping, and a processing element (320) arranged to form a three dimensional map segment based on the overlap part of the set of images, and process the formed three-dimensional map segment, said processing comprising to detect points or parts in the three dimensional map segment having characteristic features, and to update a three dimensional map with the formed three dimensional map segment, wherein the processing element is arranged to, for each processing of the formed three dimensional map segment: compare the detected points or parts in the three dimensional map segment having characteristic features to at least one preceding point or part detection and estimate a movement of the platform based on the comparison, and estimate the position of the platform based on the estimated movement of said image capturing means.
15. The system according to claim 14, further comprising a memory element (330) arranged to store at least said three dimensional map.
PCT/SE2017/051341 2017-12-22 2017-12-22 A method and system for simultaneous navigation and mapping of a space WO2019125262A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SE2017/051341 WO2019125262A1 (en) 2017-12-22 2017-12-22 A method and system for simultaneous navigation and mapping of a space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2017/051341 WO2019125262A1 (en) 2017-12-22 2017-12-22 A method and system for simultaneous navigation and mapping of a space

Publications (1)

Publication Number Publication Date
WO2019125262A1 true WO2019125262A1 (en) 2019-06-27

Family

ID=66994993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2017/051341 WO2019125262A1 (en) 2017-12-22 2017-12-22 A method and system for simultaneous navigation and mapping of a space

Country Status (1)

Country Link
WO (1) WO2019125262A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140241614A1 (en) * 2013-02-28 2014-08-28 Motorola Mobility Llc System for 2D/3D Spatial Feature Processing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140241614A1 (en) * 2013-02-28 2014-08-28 Motorola Mobility Llc System for 2D/3D Spatial Feature Processing

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANDERT FRANZ ET AL.: "On the safe navigation problem for unmanned aircraft: Visual odometry and alignment optimizations for UAV positioning", 2014 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS, 27 May 2014 (2014-05-27), pages 734 - 743, XP032610462, ISBN: 978-1-4799-2376-2, DOI: doi:10.1109/ICUAS.2014.6842318 *
GARCIA M A: "3D simultaneous localization and modeling from stereo vision", ROBOTICS AND AUTOMATION, 2004. PROCEEDINGS. ICRA'04. 2004 IEEE INTERN ATIONAL CONFERENCE ON NEW ORLEANS , LA, USA APRIL 26- MAY 1, 2004, vol. 1, 26 April 2004 (2004-04-26), Piscataway, NJ, USA, pages 847 - 853, XP010768380, ISBN: 978-0-7803-8232-9, DOI: doi:10.1109/ROBOT.2004.1307255 *
MANDERSON T ET AL.: "Texture-Aware SLAM Using Stereo Imagery and Inertial Information", 2016 13TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV, 1 June 2016 (2016-06-01), pages 456 - 463, XP033033039, ISBN: 978-1-5090-2491-9, DOI: doi:10.1109/CRV.2016.69 *
MASAHIRO TOMONO: "Robust 3D SLAM with a stereo camera based on an edge-point ICP algorithm", 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION : (ICRA) ; KOBE, JAPAN , 12 - 17 MAY 2009, 12 May 2009 (2009-05-12), Piscataway, NJ, USA, pages 4306 - 4311, XP031509763, ISBN: 978-1-4244-2788-8 *

Similar Documents

Publication Publication Date Title
US10535148B2 (en) Scanner VIS
Qu et al. Vehicle localization using mono-camera and geo-referenced traffic signs
CN105953798B (en) The pose of mobile robot determines method and apparatus
US20220036574A1 (en) System and method for obstacle avoidance
JP3561473B2 (en) Object position tracking / detection method and vehicle
US11120560B2 (en) System and method for real-time location tracking of a drone
Kümmerle et al. Large scale graph-based SLAM using aerial images as prior information
CN112785702A (en) SLAM method based on tight coupling of 2D laser radar and binocular camera
Nieuwenhuisen et al. Multimodal obstacle detection and collision avoidance for micro aerial vehicles
Wei et al. GPS and stereovision-based visual odometry: Application to urban scene mapping and intelligent vehicle localization
Fruh et al. Fast 3D model generation in urban environments
CN104197928A (en) Multi-camera collaboration-based method for detecting, positioning and tracking unmanned aerial vehicle
CN112740274A (en) System and method for VSLAM scale estimation on robotic devices using optical flow sensors
CN105844692A (en) Binocular stereoscopic vision based 3D reconstruction device, method, system and UAV
EP3005238B1 (en) Method and system for coordinating between image sensors
KR101319526B1 (en) Method for providing location information of target using mobile robot
Ikeda et al. 3D indoor environment modeling by a mobile robot with omnidirectional stereo and laser range finder
US11561553B1 (en) System and method of providing a multi-modal localization for an object
Krombach et al. Evaluation of stereo algorithms for obstacle detection with fisheye lenses
Shacklock et al. Visual guidance for autonomous vehicles: capability and challenges
Biber et al. 3d modeling of indoor environments for a robotic security guard
Aggarwal Machine vision based SelfPosition estimation of mobile robots
WO2019125262A1 (en) A method and system for simultaneous navigation and mapping of a space
Song et al. A survey: Stereo based navigation for mobile binocular robots
Huang et al. AR Mapping: Accurate and Efficient Mapping for Augmented Reality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17935616

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17935616

Country of ref document: EP

Kind code of ref document: A1