CN112967340A - Simultaneous positioning and map construction method and device, electronic equipment and storage medium - Google Patents

Simultaneous positioning and map construction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112967340A
CN112967340A CN202110169094.5A CN202110169094A CN112967340A CN 112967340 A CN112967340 A CN 112967340A CN 202110169094 A CN202110169094 A CN 202110169094A CN 112967340 A CN112967340 A CN 112967340A
Authority
CN
China
Prior art keywords
image frame
current image
pose
acquired
pose information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110169094.5A
Other languages
Chinese (zh)
Inventor
丁歆甯
李琳
周冰
周效军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110169094.5A priority Critical patent/CN112967340A/en
Publication of CN112967340A publication Critical patent/CN112967340A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

The embodiment of the invention provides a method and a device for simultaneous positioning and map construction, electronic equipment and a storage medium; the method comprises the following steps: carrying out pose estimation on the current image frame in a sparse image alignment mode to obtain pose information of the visual sensor at the moment when the current image frame is acquired; determining whether the current image frame is a key image frame or not according to the pose information of the visual sensor at the moment when the current image frame is collected; and when the current image frame is the key image frame, updating the constructed map according to the current image frame. The invention utilizes the characteristic that the gray level change is obvious due to large difference of indoor environment light and shade, obtains the characteristic points by an angular point detection method in the process of pose estimation, and then estimates the pose information according to the luminosity difference of each characteristic point without calculating descriptors of the characteristic points or using the characteristic points for image matching, thereby saving the calculated amount.

Description

Simultaneous positioning and map construction method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of augmented reality, in particular to a method and a device for simultaneously positioning and constructing a map, electronic equipment and a storage medium.
Background
With the continuous development of artificial intelligence technology, people have an increasing demand for robots that can move autonomously in various environments, and the research of intelligent mobile robots has received extensive attention from researchers in the field of artificial intelligence. Meanwhile, positioning and mapping (SLAM) is one of key technologies for realizing autonomous positioning of the robot, external environment data is acquired according to a sensor carried by the mobile machine, a mathematical model of the surrounding environment is calculated, the pose of the mobile machine is estimated, and the autonomous positioning is realized.
The existing simultaneous localization and mapping method adopts various implementation modes such as ORB-SLAM and the like. These methods require a large number of feature point calculations and image matching for the image, and the amount of computation is large. And the intelligent equipment used by users daily, such as a smart phone, is difficult to complete the real-time calculation of the simultaneous positioning and mapping method due to the limited resources.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method and a device for simultaneously positioning and constructing a map, electronic equipment and a storage medium.
In a first aspect, the present invention provides a method for synchronous positioning and map building, including:
carrying out pose estimation on the current image frame in a sparse image alignment mode to obtain pose information of the visual sensor at the moment when the current image frame is acquired;
determining whether the current image frame is a key image frame or not according to the pose information of the visual sensor at the moment when the current image frame is collected;
and when the current image frame is the key image frame, updating the constructed map according to the current image frame.
According to the synchronous positioning and map building method provided by the invention, the pose estimation is carried out on the current image frame in a sparse image alignment mode to obtain the pose information of the visual sensor at the moment when the current image frame is acquired, and the method comprises the following steps:
selecting an angle point in the current image frame as a characteristic point;
performing minimization processing on the luminosity errors of all the feature points in the current image frame to obtain the relative posture between the current image frame and the previous image frame;
and obtaining the pose information of the visual sensor at the moment when the current image frame is acquired according to the relative pose between the current image frame and the previous image frame and the pose information of the visual sensor at the moment when the previous image frame is acquired.
According to the synchronous positioning and map building method provided by the invention, the photometric errors of all the feature points in the current image frame are minimized to obtain the relative posture between the current image frame and the previous image frame, and the method comprises the following steps:
step S1, determining a first pixel point corresponding to the first feature point in the previous image frame; the first characteristic point is any one characteristic point in the current image frame;
step S2, determining a first pixel block by taking the first characteristic point as a center, determining a second pixel block by taking the first pixel point as a center, and taking the luminosity error of the first pixel block and the second pixel block as the luminosity error of the first characteristic point;
and step S3, obtaining luminosity errors of all the characteristic points in the current image frame according to the step S1 and the step S2, and performing minimization processing on the luminosity errors of all the characteristic points in the current image frame to obtain the relative posture between the current image frame and the previous image frame.
According to the synchronous positioning and map building method provided by the invention, the pose estimation is carried out on the current image frame in a sparse image alignment mode to obtain the pose information of the visual sensor at the moment when the current image frame is acquired, and the method further comprises the following steps:
and performing nonlinear least square optimization on the pose information of the visual sensor at the moment when the current image frame is acquired by adopting a beam adjustment algorithm.
According to the synchronous positioning and map building method provided by the invention, the step of determining whether the current image frame is a key image frame according to the pose information of the visual sensor at the moment when the current image frame is collected comprises the following steps:
determining that the current image frame is a key image frame when the current image frame simultaneously satisfies the following three conditions:
the number of feature points in the current image frame is greater than a first threshold, and the number of feature points on a single object in the current image frame is less than or equal to a second threshold;
the number of image frames spaced between the current image frame and the previous key image frame is greater than or equal to a third threshold value and less than or equal to a fourth threshold value;
a value of a relative pose between the current image frame and a previous image frame is less than or equal to a fifth threshold.
According to the synchronous positioning and map building method provided by the invention, the method further comprises the following steps:
when the current image frame is a key image frame, carrying out loop detection on the current image frame based on a bag-of-words model; and the characteristic data in the bag-of-words model is in a binary format.
According to the synchronous positioning and map building method provided by the invention, before the step of estimating the pose of the current image frame in a sparse image alignment mode, the method further comprises the following steps:
when the current image frame is a second frame image acquired by the vision sensor in a shooting process and a previous image frame is a first frame image acquired by the vision sensor in a shooting process, acquiring depth information of the current image frame and the previous image frame;
constructing a map according to the depth information;
and calculating a rotation matrix and a translation matrix of the current image frame and the previous image frame according to the depth information to obtain pose information of the visual sensor at the moment when the current image frame is acquired, and generating a motion track according to the pose information.
In a second aspect, the present invention provides a synchronous positioning and mapping apparatus, comprising:
the pose estimation module is used for estimating the pose of the current image frame in a sparse image alignment mode to obtain the pose information of the visual sensor at the moment when the current image frame is acquired;
the key image frame determining module is used for determining whether the current image frame is the key image frame according to the pose information of the visual sensor at the moment when the current image frame is acquired;
and the map updating module is used for updating the constructed map according to the current image frame when the current image frame is the key image frame.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for synchronous positioning and mapping according to the first aspect of the present invention when executing the program.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of synchronized positioning and mapping according to the first aspect of the present invention.
According to the method, the device, the electronic equipment and the storage medium for simultaneous positioning and map construction, provided by the embodiment of the invention, the characteristic that gray scale change is obvious due to large difference of indoor environment brightness is utilized, the feature points are obtained by an angular point detection method in the process of pose estimation, then pose information is estimated according to the luminosity difference of each feature point, the descriptor of the feature point is not required to be calculated, the feature point is not required to be utilized for image matching, and the calculated amount is saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a simultaneous localization and mapping method provided by the present invention;
FIG. 2 is a schematic diagram of a synchronous positioning and mapping apparatus provided in the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Before describing the method of the present invention in detail, a description will be given of a simultaneous localization and mapping method in the prior art.
The method for simultaneously positioning and constructing the map in the prior art mainly comprises the following steps:
step S1, an image sequence is acquired.
In this step, a sequence of images consisting of successive images may be acquired by the vision sensor.
Step S2, based on each image in the image sequence, implements estimation of the position of the moving object at different times.
In this step, it is necessary to calculate feature points and descriptors for each image, then to implement matching between the images based on the feature points and descriptors, and to determine the positions of the moving object at different times according to the matching result.
And step S3, optimizing the position estimation result of the moving target and eliminating the accumulated error.
And step S4, selecting image key frames from the image sequence according to the constraint rule.
And step S5, constructing or updating the three-dimensional map based on the selected image key frame.
And step S6, loop detection, wherein the loop detection comprises eliminating the space accumulation error of the motion trail of the moving target and updating the position of the moving target on the three-dimensional map.
The above is a description of the basic implementation steps of the simultaneous localization and mapping method in the prior art.
As can be understood from the above description of the steps, the simultaneous localization and mapping method in the prior art needs to calculate feature points and descriptors for an image and then implement image matching based on the feature points and descriptors. As known to those skilled in the art, the computation for computing the feature points and the descriptors for the image is huge, and the process of implementing image matching based on the feature points and the descriptors also consumes a large amount of computing resources, so that the requirement of the simultaneous localization and mapping method in the prior art on the computing resources is high.
In the loop detection step, similarity between image key frames needs to be calculated based on a bag-of-words model. The word bag model in the prior art stores data in a txt file, and reading of the txt file consumes a lot of time, thereby affecting the real-time performance of the method.
In view of the above problems of the prior art, the present invention provides a method for simultaneous localization and mapping in a daily indoor environment.
Fig. 1 is a flowchart of a simultaneous localization and mapping method provided by the present invention, and as shown in fig. 1, the simultaneous localization and mapping method provided by the present invention includes:
step 101, performing pose estimation on a current image frame in a sparse image alignment mode to obtain pose information of a visual sensor at the moment when the current image frame is acquired.
In the process of simultaneous localization and mapping, a visual sensor is required to capture an image. In this embodiment, the vision sensor is a monocular camera, and in other embodiments, the vision sensor may also be other types of vision sensing devices such as a binocular camera and a camera. The image shot by the vision sensor and processed by the method of the invention at the current moment is the current image frame; in the image sequence obtained by shooting by the vision sensor, one frame image before the current image frame is the previous image frame.
Pose information refers to the rotation and translation of the vision sensor coordinate system relative to a reference coordinate system (e.g., the world coordinate system), and is typically represented by a rotation matrix and a translation matrix. At different times, the pose information of the vision sensors may not be the same.
The simultaneous localization and map construction method is mainly suitable for daily indoor environments, and the indoor environments have the characteristic of large light and shade difference, so that in the method, the pose of the current image frame is estimated in a sparse image alignment mode by utilizing the characteristic of obvious gray level change caused by the large light and shade difference of the indoor environments. Specifically, the method comprises the following steps:
selecting an angle point in the current image frame as a characteristic point;
performing minimization processing on the luminosity errors of all the feature points in the current image frame to obtain the relative posture between the current image frame and the previous image frame;
and obtaining the pose information of the visual sensor at the moment when the current image frame is acquired according to the relative pose between the current image frame and the previous image frame and the pose information of the visual sensor at the moment when the previous image frame is acquired.
In this embodiment, a FAST corner detection method may be used to select a corner in the current image frame. The basic principle of the FAST corner detection method is as follows: and calculating the characteristic difference between a pixel point and a circle center pixel point within the circumferential range of a preset radius by taking a certain pixel point in the image as the circle center, wherein if the characteristic difference is greater than a preset threshold value, the corresponding pixel point is an angular point, namely a characteristic point in the invention.
After the characteristic points of the current image frame are obtained, the luminosity errors of all the characteristic points in the current image frame are minimized, and the relative posture between the current image frame and the previous image frame is obtained.
Taking any one feature point in the current image frame as an example, the process of calculating the luminosity error is as follows:
determining a first pixel point corresponding to the first characteristic point in the previous image frame; the first characteristic point is any one characteristic point in the current image frame;
determining a first pixel block (for example, a 4 × 4 pixel block centered on the first feature point) with the first feature point as a center, determining a second pixel block (for example, a 4 × 4 pixel block centered on the first feature point) with the first pixel point as a center, and taking a luminosity error between the first pixel block and the second pixel block as a luminosity error of the first feature point.
Referring to the above steps, the photometric errors of the feature points in the current image frame may be calculated one by one. And then, carrying out minimization processing on the luminosity errors of all the characteristic points between the current image frame and the previous image frame to obtain the relative posture between the current image frame and the previous image frame. When the photometric error is minimized, the sparse 4 × 4patch photometric error between two frames can be used as a loss function, and a G-N optimization algorithm can be used for solving.
As can be seen from the above description, for the characteristic of large difference between brightness and darkness of the indoor environment, in this step, feature points are obtained by using a fast (features from estimated Segment test) corner point detection method, and then pose information is estimated according to the luminosity difference of each feature point. Compared with the prior art, the position and pose information does not need to be calculated by descriptors of the feature points or image matching by the feature points in the calculation process, and the calculation amount is greatly saved.
After the relative pose between the current image frame and the previous image frame is obtained, the pose information of the vision sensor at the moment when the current image frame is acquired can be obtained based on the known pose information of the vision sensor at the moment when the previous image frame is acquired.
As known to those skilled in the art, the estimation of the pose information is performed sequentially according to the time sequence of the captured image frames, so that the pose information of the vision sensor at the time when the previous image frame is acquired is calculated in advance and can be directly used in the step. For the image frames initially acquired by the vision sensor in one shooting process, the calculation of the corresponding pose information thereof will be further described in other embodiments.
The pose information obtained in this step can be further utilized. For example, according to the pose information of the visual sensor at the time when the current image frame is acquired, the point in the map can be re-projected into the current image frame, and then the position of the projected point is corrected to obtain the coordinate of the three-dimensional space point observed by the visual sensor at the time when the current image frame is acquired.
And correcting the position of the projection point refers to comparing the position of the projection point in the current image frame with the position of a point corresponding to the projection point in the map, and correcting the position of the point in the map according to the comparison result. The coordinates of the points after the position correction are the coordinates of the three-dimensional space points observed by the vision sensor at the current image frame acquisition time. Since the proxels are usually feature points, this process is also referred to as feature point alignment. The data aligned by the feature points may be applied when updating the motion trajectory.
And 102, determining whether the current image frame is a key image frame according to the pose information of the visual sensor at the moment when the current image frame is acquired.
The image frames acquired by the visual sensor have a large data volume, and subsequently, when the motion trail of the moving target is calculated or the map is updated, if all the images acquired by the visual sensor are calculated, a large amount of calculation resources are consumed. Therefore, in this step, the image frames need to be determined, and only the image frames meeting the conditions are used for updating the motion trail and the map in the subsequent steps.
In this step, the process of determining whether the current image frame is the key image frame is not limited. As long as the determined key image frames are representative and the number of key image frames is less than the total number of image frames acquired by the vision sensor, the method helps to reduce the occupation of computing resources.
With the continuous collection of the image frames by the visual sensor, new key image frames can be continuously obtained according to the newly collected image frames, and then the continuously obtained key image frames can be used for updating the map and the motion trail.
And 103, when the current image frame is the key image frame, updating the constructed map according to the current image frame.
The map updating is a continuous process, and the map after the previous updating can be updated again according to the data of the feature points in the key image frame.
How to update the map according to the data of the feature points in the key image frame is common knowledge of those skilled in the art, and will not be further described here.
As known to those skilled in the art, the simultaneous localization and mapping method needs to implement the functions of mapping and localization at the same time. Therefore, the method of the invention can also update the motion trail of the visual sensor according to the pose information of the visual sensor at the moment when the current image frame is collected, thereby outputting a 3D motion trail diagram. The generation of the motion trajectory map from the pose information is common knowledge of those skilled in the art, and therefore will not be described in detail in this embodiment.
The simultaneous localization and map construction method provided by the invention utilizes the characteristic that the gray level change caused by large difference of indoor environment brightness is obvious, obtains the feature points by an angular point detection method in the process of pose estimation, and then estimates the pose information according to the luminosity difference of each feature point without calculating descriptors of the feature points or utilizing the feature points to carry out image matching, thereby saving the calculated amount.
Based on any one of the above embodiments, in this embodiment, the performing pose estimation on the current image frame in a sparse image alignment manner to obtain pose information of the visual sensor at the moment when the current image frame is acquired further includes:
and performing nonlinear least square optimization on the pose information of the visual sensor at the moment when the current image frame is acquired by adopting a beam adjustment algorithm.
In this embodiment, in view of the significant light and shade characteristics in the indoor environment, a beam adjustment algorithm may be used to perform nonlinear least square optimization on the pose information of the vision sensor at the moment when the current image frame is acquired, so as to obtain pose information with higher accuracy. This helps to improve the accuracy of the subsequent calculation results.
In the foregoing embodiment, it is mentioned that, based on the pose information of the vision sensor at the time when the current image frame is acquired, the coordinates of the three-dimensional space point observed by the vision sensor at the time when the current image frame is acquired can be further obtained. Therefore, in this embodiment, a beam adjustment algorithm may be further used to perform nonlinear least square optimization on the coordinates of the three-dimensional space point observed by the vision sensor at the current image frame acquisition time, so as to obtain the coordinates of the three-dimensional space point with higher precision.
According to the method for simultaneously positioning and constructing the map, the pose information of the visual sensor at the moment when the current image frame is acquired is optimized through a beam adjustment algorithm, so that the accuracy of the pose information is improved, and the accuracy of a motion track and the accuracy of the map are further improved.
Based on any one of the foregoing embodiments, in this embodiment, the determining, according to the pose information of the visual sensor at the time when the current image frame is acquired, whether the current image frame is a key image frame includes:
determining that the current image frame is a key image frame when the current image frame simultaneously satisfies the following three conditions:
the number of feature points in the current image frame is greater than a first threshold, and the number of feature points on a single object in the current image frame is less than or equal to a second threshold;
the number of image frames spaced between the current image frame and the previous key image frame is greater than or equal to a third threshold value and less than or equal to a fourth threshold value;
a value of a relative pose between the current image frame and a previous image frame is less than or equal to a fifth threshold.
The image frames acquired by the visual sensor have a large data volume, and subsequently, when the motion trail of the moving target is calculated or the map is updated, if all the images acquired by the visual sensor are calculated, a large amount of calculation resources are consumed. Therefore, in this step, the image frames need to be determined, and only the image frames meeting the conditions are used for updating the motion trail and the map in the subsequent steps.
Specifically, when the current image frame simultaneously satisfies the following three conditions, the current image frame is determined to be a key image frame:
1) the number of feature points in the current image frame is greater than a first threshold, and the number of feature points on a single object in the current image frame is less than or equal to a second threshold;
the number of the feature points in the current image frame is greater than a first threshold value, and the feature points are used for ensuring that the current image frame has identifiability; the number of the characteristic points on the single object in the current image frame is less than or equal to a second threshold value, so as to ensure that the image has no excessive redundancy. The first threshold and the second threshold can be set according to actual conditions, for example, the first threshold is 70, and the second threshold is 100.
2) The number of image frames spaced between the current image frame and the previous key image frame is greater than or equal to a third threshold value and less than or equal to a fourth threshold value;
wherein the third threshold is less than the fourth threshold. The specific values of the third threshold and the fourth threshold are related to the computing power of the electronic equipment adopting the method, and the values of the third threshold and the fourth threshold can be set to be smaller values with stronger computing power. If the third threshold is 10, the fourth threshold is 20.
3) A value of a relative pose between the current image frame and a previous image frame is less than or equal to a fifth threshold.
If the pose of the adjacent image frame is changed too much, namely the value of the relative pose between the current image frame and the previous image frame is larger than the fifth threshold value, an abnormal condition may occur, for example, the electronic equipment adopting the method of the invention is moved by an external force for a certain distance or the vision sensor is dropped and damaged. Obviously, the image frames obtained in abnormal situations should not be used as key frames, otherwise the accuracy of the final result will be affected. The specific value of the fifth threshold is determined by the number of frames of images shot by the vision sensor per second and the movement speed of the movement device where the vision sensor is located. For example, if a camera takes 20 pictures in one second, and the speed of the motion device (e.g., the smart mobile robot where the camera is located) where the camera is located is 0.1 m/s, the fifth threshold calculation value formula is: (speed/frames per second) coefficient; the coefficient is generally 3 by default, and the magnitude of the fifth threshold obtained in the above example is: (0.1/20) × 3 ═ 0.015 m.
As can be seen from the selection process of the key image frames, the number of the key image frames is less than the total number of the image frames acquired by the vision sensor in the same time period, which is helpful for reducing the occupation of computing resources; meanwhile, as the visual sensor continuously collects image frames, new key image frames can be continuously obtained according to the newly collected image frames, and then the continuously obtained key image frames can be used for updating the map and the motion trail.
Based on any one of the above embodiments, in this embodiment, the method further includes:
when the current image frame is a key image frame, carrying out loop detection on the current image frame based on a bag-of-words model; and the characteristic data in the bag-of-words model is in a binary format.
When the key image frame is subjected to loop detection, similarity calculation needs to be carried out on the key image frame based on the bag-of-words model. When calculating the similarity, the feature data of the key frame is used as the basis for calculating the similarity. In this embodiment, the feature data is stored in the form of binary data in the bag-of-words model. Thus, the computer can directly identify the read feature data, and the time for loading the feature data is greatly reduced.
After the loop is determined to exist, the Pose information corresponding to the whole key image frame can be optimized through Pose Graph (Pose Graph) optimization, and the accumulated drift error is restrained.
And the pose graph optimization refers to a process of representing optimization variables by using vertexes in the pose graph, representing error items by using edges and minimizing errors. How to optimize the pose information corresponding to the key image frame through pose graph optimization is common knowledge of those skilled in the art, and is not further described here.
Before the simultaneous localization and map construction method is finished, an independent thread can be initiated temporarily to execute a global beam adjustment method, and global optimization is carried out on the pose information corresponding to the key image frame, so that an optimal map and a motion track are obtained.
The method for simultaneously positioning and constructing the map stores the characteristic data in the word bag model in the form of binary data, so that a computer can directly identify the read characteristic data, and the time for loading the characteristic data is greatly reduced.
Based on any one of the above embodiments, in this embodiment, before the step of performing pose estimation on the current image frame by sparse image alignment, the method further includes:
when the current image frame is a second frame image acquired by the vision sensor in a shooting process and a previous image frame is a first frame image acquired by the vision sensor in a shooting process, acquiring depth information of the current image frame and the previous image frame;
constructing a map according to the depth information;
and calculating a rotation matrix and a translation matrix of the current image frame and the previous image frame according to the depth information to obtain pose information of the visual sensor at the moment when the current image frame is acquired, and generating a motion track according to the pose information.
In the foregoing embodiments, the implementation process of pose estimation for the current image frame has been described. However, if the current image frame is the first frame image or the second frame image acquired by the vision sensor in one shooting process, the pose estimation implementation method described in the foregoing embodiment is not applicable. Therefore, in the present embodiment, a description is given of the posture trajectory realization process in this specific case.
Specifically, a triangulation method can be adopted to obtain depth information of the current image frame and the previous image frame, then homography matrixes of the current image frame and the previous image frame are calculated according to the depth information, rotation matrixes and translation matrixes of the current image frame and the previous image frame are calculated according to the homography matrixes, the calculated rotation matrixes are subjected to scale transformation, and the rotation matrixes and the translation matrixes after the scale transformation are pose information; and meanwhile, creating map points according to the depth information so as to obtain a map.
Those skilled in the art can easily understand that the pose information obtained in this embodiment is initial pose information, and when pose estimation is performed subsequently in a sparse image alignment manner, the pose information obtained in this embodiment needs to be used to calculate pose information of a next image frame.
Similarly, the map obtained in this embodiment is an initial map, and is gradually updated in subsequent steps, so that a map with more complete information is constructed.
The simultaneous localization and map construction method provided by the invention utilizes the characteristic that the gray level change caused by large difference of indoor environment brightness is obvious, obtains the feature points by an angular point detection method in the process of pose estimation, and then estimates the pose information according to the luminosity difference of each feature point without calculating descriptors of the feature points or utilizing the feature points to carry out image matching, thereby saving the calculated amount.
Based on any of the above embodiments, fig. 2 is a schematic diagram of a synchronous positioning and mapping apparatus provided by the present invention, and as shown in fig. 2, the synchronous positioning and mapping apparatus provided by the present invention includes:
the pose estimation module 201 is configured to perform pose estimation on the current image frame in a sparse image alignment manner, so as to obtain pose information of the visual sensor at a time when the current image frame is acquired.
In this embodiment, the pose estimation module 201 is configured to perform pose estimation on the current image frame in a sparse image alignment manner, and specifically may include:
a feature point selecting unit configured to select an angle point in the current image frame as a feature point;
the relative attitude calculation unit is used for carrying out minimization processing on the luminosity errors of all the feature points in the current image frame to obtain the relative attitude between the current image frame and the previous image frame;
and the pose information generating unit is used for obtaining the pose information of the visual sensor at the moment when the current image frame is acquired according to the relative pose between the current image frame and the previous image frame and the pose information of the visual sensor at the moment when the previous image frame is acquired.
A key image frame determining module 202, configured to determine whether the current image frame is a key image frame according to the pose information of the visual sensor at the time when the current image frame is acquired.
And the map updating module 203 is configured to update the constructed map according to the current image frame when the current image frame is the key image frame.
The simultaneous localization and map construction method provided by the invention utilizes the characteristic that the gray level change caused by large difference of indoor environment brightness is obvious, obtains the feature points by an angular point detection method in the process of pose estimation, and then estimates the pose information according to the luminosity difference of each feature point without calculating descriptors of the feature points or utilizing the feature points to carry out image matching, thereby saving the calculated amount.
Fig. 3 is a schematic physical structure diagram of an electronic device according to the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform the following method:
carrying out pose estimation on the current image frame in a sparse image alignment mode to obtain pose information of the visual sensor at the moment when the current image frame is acquired;
determining whether the current image frame is a key image frame or not according to the pose information of the visual sensor at the moment when the current image frame is collected;
and when the current image frame is the key image frame, updating the constructed map according to the current image frame.
It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 310, the communication interface 320, the memory 330, and the communication bus 340 shown in fig. 3, where the processor 310, the communication interface 320, and the memory 330 complete mutual communication through the communication bus 340, and the processor 310 may call the logic instruction in the memory 330 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the methods provided by the above-mentioned method embodiments, for example, comprising:
carrying out pose estimation on the current image frame in a sparse image alignment mode to obtain pose information of the visual sensor at the moment when the current image frame is acquired;
determining whether the current image frame is a key image frame or not according to the pose information of the visual sensor at the moment when the current image frame is collected;
and when the current image frame is the key image frame, updating the constructed map according to the current image frame.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including:
carrying out pose estimation on the current image frame in a sparse image alignment mode to obtain pose information of the visual sensor at the moment when the current image frame is acquired;
determining whether the current image frame is a key image frame or not according to the pose information of the visual sensor at the moment when the current image frame is collected;
and when the current image frame is the key image frame, updating the constructed map according to the current image frame.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of synchronized positioning and mapping, comprising:
carrying out pose estimation on the current image frame in a sparse image alignment mode to obtain pose information of the visual sensor at the moment when the current image frame is acquired;
determining whether the current image frame is a key image frame or not according to the pose information of the visual sensor at the moment when the current image frame is collected;
and when the current image frame is the key image frame, updating the constructed map according to the current image frame.
2. The synchronous positioning and mapping method according to claim 1, wherein the performing pose estimation on the current image frame in a sparse image alignment manner to obtain pose information of the visual sensor at the moment when the current image frame is acquired comprises:
selecting an angle point in the current image frame as a characteristic point;
performing minimization processing on the luminosity errors of all the feature points in the current image frame to obtain the relative posture between the current image frame and the previous image frame;
and obtaining the pose information of the visual sensor at the moment when the current image frame is acquired according to the relative pose between the current image frame and the previous image frame and the pose information of the visual sensor at the moment when the previous image frame is acquired.
3. The synchronous positioning and mapping method according to claim 2, wherein the minimizing the photometric errors of all the feature points in the current image frame to obtain the relative pose between the current image frame and the previous image frame comprises:
step S1, determining a first pixel point corresponding to the first feature point in the previous image frame; the first characteristic point is any one characteristic point in the current image frame;
step S2, determining a first pixel block by taking the first characteristic point as a center, determining a second pixel block by taking the first pixel point as a center, and taking the luminosity error of the first pixel block and the second pixel block as the luminosity error of the first characteristic point;
and step S3, obtaining luminosity errors of all the characteristic points in the current image frame according to the step S1 and the step S2, and performing minimization processing on the luminosity errors of all the characteristic points in the current image frame to obtain the relative posture between the current image frame and the previous image frame.
4. The synchronous positioning and mapping method according to claim 2, wherein the pose estimation is performed on the current image frame in a sparse image alignment manner to obtain the pose information of the visual sensor at the time when the current image frame is acquired, and further comprising:
and performing nonlinear least square optimization on the pose information of the visual sensor at the moment when the current image frame is acquired by adopting a beam adjustment algorithm.
5. The synchronous positioning and mapping method according to claim 2, wherein the determining whether the current image frame is a key image frame according to the pose information of the vision sensor at the time when the current image frame is collected comprises:
determining that the current image frame is a key image frame when the current image frame simultaneously satisfies the following three conditions:
the number of feature points in the current image frame is greater than a first threshold, and the number of feature points on a single object in the current image frame is less than or equal to a second threshold;
the number of image frames spaced between the current image frame and the previous key image frame is greater than or equal to a third threshold value and less than or equal to a fourth threshold value;
a value of a relative pose between the current image frame and a previous image frame is less than or equal to a fifth threshold.
6. The synchronized positioning and mapping method of any of claims 1-5, further comprising:
when the current image frame is a key image frame, carrying out loop detection on the current image frame based on a bag-of-words model; and the characteristic data in the bag-of-words model is in a binary format.
7. The simultaneous localization and mapping method according to any of claims 1-5, wherein prior to the step of pose estimation of the current image frame by means of sparse image alignment, the method further comprises:
when the current image frame is a second frame image acquired by the vision sensor in a shooting process and a previous image frame is a first frame image acquired by the vision sensor in a shooting process, acquiring depth information of the current image frame and the previous image frame;
constructing a map according to the depth information;
and calculating a rotation matrix and a translation matrix of the current image frame and the previous image frame according to the depth information to obtain pose information of the visual sensor at the moment when the current image frame is acquired, and generating a motion track according to the pose information.
8. A synchronized positioning and mapping apparatus, comprising:
the pose estimation module is used for estimating the pose of the current image frame in a sparse image alignment mode to obtain the pose information of the visual sensor at the moment when the current image frame is acquired;
the key image frame determining module is used for determining whether the current image frame is the key image frame according to the pose information of the visual sensor at the moment when the current image frame is acquired;
and the map updating module is used for updating the constructed map according to the current image frame when the current image frame is the key image frame.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of synchronized positioning and mapping according to any of claims 1 to 7 are implemented by the processor when executing the program.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, performs the steps of the method for simultaneous localization and mapping according to any one of claims 1 to 7.
CN202110169094.5A 2021-02-07 2021-02-07 Simultaneous positioning and map construction method and device, electronic equipment and storage medium Pending CN112967340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110169094.5A CN112967340A (en) 2021-02-07 2021-02-07 Simultaneous positioning and map construction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110169094.5A CN112967340A (en) 2021-02-07 2021-02-07 Simultaneous positioning and map construction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112967340A true CN112967340A (en) 2021-06-15

Family

ID=76275203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110169094.5A Pending CN112967340A (en) 2021-02-07 2021-02-07 Simultaneous positioning and map construction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112967340A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114019953A (en) * 2021-10-08 2022-02-08 中移(杭州)信息技术有限公司 Map construction method, map construction device, map construction equipment and storage medium
CN115700507A (en) * 2021-07-30 2023-02-07 北京小米移动软件有限公司 Map updating method and device
WO2023160445A1 (en) * 2022-02-22 2023-08-31 维沃移动通信有限公司 Simultaneous localization and mapping method and apparatus, electronic device, and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025668A (en) * 2017-03-30 2017-08-08 华南理工大学 A kind of design method of the visual odometry based on depth camera
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber
WO2019157925A1 (en) * 2018-02-13 2019-08-22 视辰信息科技(上海)有限公司 Visual-inertial odometry implementation method and system
CN110866496A (en) * 2019-11-14 2020-03-06 合肥工业大学 Robot positioning and mapping method and device based on depth image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025668A (en) * 2017-03-30 2017-08-08 华南理工大学 A kind of design method of the visual odometry based on depth camera
WO2019157925A1 (en) * 2018-02-13 2019-08-22 视辰信息科技(上海)有限公司 Visual-inertial odometry implementation method and system
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber
CN110866496A (en) * 2019-11-14 2020-03-06 合肥工业大学 Robot positioning and mapping method and device based on depth image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
倪瑛 等: "无人机导航定位技术", 30 November 2020, 航空工业出版社, pages: 127 - 128 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115700507A (en) * 2021-07-30 2023-02-07 北京小米移动软件有限公司 Map updating method and device
CN115700507B (en) * 2021-07-30 2024-02-13 北京小米移动软件有限公司 Map updating method and device
CN114019953A (en) * 2021-10-08 2022-02-08 中移(杭州)信息技术有限公司 Map construction method, map construction device, map construction equipment and storage medium
CN114019953B (en) * 2021-10-08 2024-03-19 中移(杭州)信息技术有限公司 Map construction method, device, equipment and storage medium
WO2023160445A1 (en) * 2022-02-22 2023-08-31 维沃移动通信有限公司 Simultaneous localization and mapping method and apparatus, electronic device, and readable storage medium

Similar Documents

Publication Publication Date Title
CN110070615B (en) Multi-camera cooperation-based panoramic vision SLAM method
CN109307508B (en) Panoramic inertial navigation SLAM method based on multiple key frames
CN107160395B (en) Map construction method and robot control system
CN106940704B (en) Positioning method and device based on grid map
CN110310333B (en) Positioning method, electronic device and readable storage medium
CN111462207A (en) RGB-D simultaneous positioning and map creation method integrating direct method and feature method
CN112967340A (en) Simultaneous positioning and map construction method and device, electronic equipment and storage medium
CN110176032B (en) Three-dimensional reconstruction method and device
CN111210463A (en) Virtual wide-view visual odometer method and system based on feature point auxiliary matching
CN112652020B (en) Visual SLAM method based on AdaLAM algorithm
JPS63213005A (en) Guiding method for mobile object
CN111998862A (en) Dense binocular SLAM method based on BNN
CN112183506A (en) Human body posture generation method and system
CN112541423A (en) Synchronous positioning and map construction method and system
CN114494150A (en) Design method of monocular vision odometer based on semi-direct method
KR20180035359A (en) Three-Dimensional Space Modeling and Data Lightening Method using the Plane Information
CN113052907B (en) Positioning method of mobile robot in dynamic environment
CN113420590B (en) Robot positioning method, device, equipment and medium in weak texture environment
CN113361365A (en) Positioning method and device, equipment and storage medium
CN117011660A (en) Dot line feature SLAM method for fusing depth information in low-texture scene
CN112562068A (en) Human body posture generation method and device, electronic equipment and storage medium
CN116128966A (en) Semantic positioning method based on environmental object
CN113628284B (en) Pose calibration data set generation method, device and system, electronic equipment and medium
Zieliński et al. Keyframe-based dense mapping with the graph of view-dependent local maps
CN113847907A (en) Positioning method and device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination