CN115131420A

CN115131420A - Visual SLAM method and device based on key frame optimization

Info

Publication number: CN115131420A
Application number: CN202210729683.9A
Authority: CN
Inventors: 付诚; 陈志涛; 夏华佳
Original assignee: Wuhan Yixun Beidou Space Time Technology Co ltd
Current assignee: Wuhan Yixun Beidou Space Time Technology Co ltd
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-30

Abstract

The invention provides a visual SLAM method and a device based on key frame optimization, wherein the method comprises the following steps: receiving a current image frame from an image acquisition device; acquiring first pose information based on a current image frame and a previous image frame of the current image frame; when the current image frame is a key frame, matching between the current key frame and a previous key frame of the current key frame to obtain a map point; establishing a common-view directed connection graph by using the screened target historical key frames and the current key frames based on map points as constraint conditions; based on the common view directed connection graph and the first pose information, local clustering adjustment is carried out to obtain pose estimation information; and performing local clustering adjustment based on the second sliding window and the pose estimation information to acquire second pose information and acquire updated map points, so that a scene map is established based on the updated map points. The visual SLAM method and device based on key frame optimization can give consideration to both low computational complexity and high precision.

Description

Visual SLAM method and device based on key frame optimization

Technical Field

The invention relates to the technical field of image processing, in particular to a visual SLAM method and device based on key frame optimization.

Background

In recent years, a Simultaneous Localization And Mapping (SLAM) technology has been widely developed, which provides six-degree-of-freedom motion estimation for mobile robots And autonomous vehicles, And can recover a 3D structure of its own surrounding environment from a continuous 3D video stream. Generally, SLAM can be divided into a front-end and a back-end. In the visual front end, the pose of the camera is estimated through a sequence image captured by the camera, mainly through the relation between adjacent frames, and the continuous chain operation depending on local constraint inevitably leads to continuous accumulation of pose errors and finally generates a larger drift result. The optimization rear end of SLAM mainly optimizes and maps the inaccurate pose obtained by the vision front end, the optimization idea is to select key frames from the whole situation, and the relation between the key frames is used for establishing the whole situation constraint with larger time and space span.

At the present stage, a relatively mature visual SLAM scheme is developed based on ORBSLAM2, the method adopts a characteristic point descriptor-based method to track and match characteristic points in two adjacent frames of images, chain recurs the initial poses of all frames in advance, and then performs pose optimization on key frames. However, this pose optimization method uses descriptors to match feature points for estimation between common frames, which is unnecessary although with high accuracy. Moreover, when the slide window is used for SLAM pose optimization, the local track cannot be fully explored, and even if a large calculation force is input, the pose estimation stability and accuracy are poor.

Disclosure of Invention

The invention provides a visual SLAM method and device based on keyframe optimization, which are used for solving the defect of insufficient robustness caused by degradation of visual pose estimation in a vehicle-mounted part scene due to insufficient exploration of a local track in the prior art.

The invention provides a visual SLAM method based on key frame optimization, which comprises the following steps:

receiving a current image frame from an image acquisition device;

matching the current image frame and the last image frame of the current image frame by using an optical flow method to acquire first position and attitude information;

under the condition that the current image frame meets a preset condition, adding the current image frame serving as a current key frame to a first sliding window, and matching between the current key frame and a previous key frame of the current key frame by using a descriptor-based matching method to obtain a map point;

based on the map points as constraint conditions, screening out target historical key frames from the first sliding window, and establishing a common-view directed connected graph based on the target historical key frames and the current key frames;

based on the common view directed connection graph and the first pose information, performing local clustering adjustment to acquire pose estimation information;

based on a second sliding window and the pose estimation information, performing local clustering adjustment to obtain second pose information and obtain updated map points, so that an SLAM scene map is established based on the updated map points;

the first sliding window is used for storing key frames contained in a carrier in a driving process of a preset distance, the second sliding window is used for storing key frames contained in a preset historical duration and key frames contained in the carrier in the driving process of the preset distance, and the second pose information is pose information after the current key frames are optimized.

According to the visual SLAM method based on keyframe optimization provided by the invention, the matching is carried out by utilizing an optical flow method based on the current image frame and the last image frame of the current image frame to acquire first pose information, and the method comprises the following steps:

preprocessing the current image frame, and screening out a target feature point from a last image frame of the current image frame;

matching the current image frame and the previous image frame of the current image frame by utilizing a pyramid optical flow method based on the target feature points to obtain a matching point set;

and performing pose calculation with six degrees of freedom based on the matching point set to acquire the first pose information.

According to the visual SLAM method based on keyframe optimization provided by the invention, the local clustering adjustment is carried out based on the common view directed connection graph and the first pose information to acquire the pose estimation information, and the method comprises the following steps:

performing fusion operation by using the current key frame and each historical key frame in the common-view directed connected graph, and screening out target map points from the map points;

and performing local clustering adjustment based on the first position and orientation information and the target map point, acquiring the position and orientation estimation information, and updating the target map point.

According to the visual SLAM method based on keyframe optimization provided by the invention, the preprocessing is carried out on the current image frame, and the target feature point is screened out from the last image frame of the current image frame, and the method comprises the following steps:

performing histogram equalization on the current image frame to obtain the processed current image frame;

and based on a target model, carrying out motion estimation on the feature points of the previous image frame of the current image frame to obtain target feature points.

According to the visual SLAM method based on the key frame optimization, the target model comprises one of a uniform speed changing model, a uniform speed model and a variable speed model.

According to the visual SLAM method based on key frame optimization provided by the invention, the method further comprises the following steps:

adding a new current key frame in the first sliding window or the second sliding window, and marginalizing the historical key frame; and the number of the first and second groups,

and under the condition that the loop detection result of the current key frame is true, loop correction is carried out by utilizing closed-loop constraint.

The present invention also provides a visual SLAM device based on keyframe optimization, comprising:

the receiving module is used for receiving the current image frame from the image acquisition device;

the initial pose acquisition module is used for matching the current image frame and the previous image frame of the current image frame by using an optical flow method to acquire first pose information;

the key frame matching module is used for adding the current image frame serving as a current key frame to a first sliding window under the condition that the current image frame meets a preset condition, and matching the current key frame with a previous key frame of the current key frame by using a descriptor-based matching method to obtain map points;

a common-view relationship establishing module, configured to screen out a target history key frame from the first sliding window based on the map point as a constraint condition, and establish a common-view directed connected graph based on the target history key frame and the current key frame;

the first clustering module is used for carrying out local clustering adjustment based on the common-view directed connection graph and the first pose information to acquire the pose estimation information;

the second clustering module is used for carrying out local clustering adjustment based on a second sliding window and the pose estimation information, acquiring second pose information and acquiring updated map points so as to establish an SLAM scene map based on the updated map points;

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the keyframe optimization-based visual SLAM method as described in any one of the above.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a visual SLAM method based on keyframe optimization as recited in any one of the above.

The present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a visual SLAM method based on keyframe optimization as described in any one of the above.

The invention provides a visual SLAM method and device based on key frame optimization, which are based on an optical flow method for tracking and matching, and are used for pushing out first position and attitude information in a chain manner according to the matching relation between a current image frame and adjacent common frames, extracting the current key frame from the current image frame through a target condition, tracking and matching the current key frame and the adjacent key frames in a sliding window by using a descriptor method, acquiring map points, performing local clustering adjustment through the map points, and optimizing second position and attitude information and related map points of the current key frame. Certain precision can be sacrificed in SLAM front-end processing to obtain low computational effort, the precision of the key frame posture optimization is improved in SLAM rear-end processing, and both low computational effort and high precision are considered.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a visual SLAM method based on key frame optimization according to the present invention;

FIG. 2 is a schematic flow chart of hierarchical bundle adjustment provided by the present invention;

FIG. 3 is a schematic structural diagram of a ramp model provided by the present invention;

FIG. 4 is a schematic diagram of a visual SLAM apparatus based on key frame optimization according to the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Fig. 1 is a schematic flowchart of a visual SLAM method based on keyframe optimization according to the present invention. As shown in fig. 1, a visual SLAM method based on keyframe optimization provided by an embodiment of the present invention includes: step 101, receiving a current image frame from an image acquisition device.

It should be noted that the main execution body of the visual SLAM method based on key frame optimization provided by the embodiment of the present invention is a visual SLAM device based on key frame optimization.

The visual SLAM method based on the key frame optimization provided by the embodiment of the application is suitable for synchronously positioning and drawing the acquired video image through the electronic equipment with the visual SLAM device based on the key frame optimization, and recovering the 3D structure of the surrounding environment of the equipment.

The visual SLAM method based on key frame optimization provided by the embodiment of the application comprises but is not limited to a mobile robot or an automatic driving vehicle, the visual SLAM method starts to move from an unknown position in an unknown environment, self-positioning is carried out according to the position and a map in the moving process, and meanwhile an incremental map is built on the basis of the self-positioning, so that the autonomous positioning and navigation of equipment are realized.

The electronic device described above may be implemented in various forms. For example, the electronic device described in the embodiments of the present application may be a terminal device that integrates a visual SLAM device optimized based on a key frame and an image capture device, such as a mobile terminal of a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, a smart band, a smart watch, a digital camera, and the like.

The electronic device described in the embodiment of the present application may also be a terminal device that is configured with a real-time image stabilization device for a video image, and the terminal needs to be in communication connection with a video acquisition device. A fixed terminal such as a desktop computer, etc. In the following, it is assumed that the electronic device is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for moving purposes.

Specifically, in step 101, the visual SLAM device based on keyframe optimization receives a continuous image sequence transmitted by the image capture device in real time, and reads out the latest image frame as the current image frame for processing.

102, based on the current image frame and the last image frame of the current image frame, matching is carried out by utilizing an optical flow method, and first position and orientation information is obtained.

It should be noted that the image frame immediately preceding the current image frame refers to the adjacent image frame that precedes the current image frame in the image sequence received in step 101.

Specifically, in step 102, the visual SLAM device based on keyframe optimization extracts feature points existing in a previous image frame of the current image frame, performs matching tracking on the feature points with the current image frame, and may calculate first pose information corresponding to the current image frame by using pixel coordinate information of the matched feature points.

The first pose information is the pose information of the image acquisition device at the moment corresponding to the current image frame, which is obtained by the image acquisition device along with the movement of the electronic equipment. Wherein, the first posture information comprises a rotation variation R and a translation variation t.

It can be understood that the above operations are repeated for the image sequence sent by the image acquisition device to obtain the relative displacement information between any adjacent frames, so as to obtain the camera (i.e. the image acquisition device) pose corresponding to the shooting time of the image sequence.

It is to be noted that the feature point matching algorithm between adjacent frames is an algorithm other than the feature point descriptor matching-based method, and exemplarily, the feature point matching algorithm includes, but is not limited to, an optical flow method such as a gradient (differential) -based optical flow method, a matching-based optical flow method, an energy (frequency) -based optical flow method, a phase-based optical flow method, or a neurodynamic optical flow method. The present invention is not particularly limited.

Preferably, the feature point matching algorithm between adjacent frames is a gradient (differential) based optical flow method, the feature point shift is calculated by constructing a gray scale error function minimizing the feature points for matching, and the accuracy of the feature point tracking matching is larger than that of other algorithms.

And 103, under the condition that the current image frame meets the preset conditions, adding the current image frame serving as a current key frame to the first sliding window, and matching between the current key frame and a previous key frame of the current key frame by using a descriptor-based matching method to obtain a map point.

The first sliding window is used for storing key frames contained in the driving process of the carrier at a preset distance

The preset condition is a determination condition set in advance based on whether the current image frame is a key frame.

The preset condition may be a comprehensive judgment condition for multiple attribute information of the current image frame, or an independent judgment condition for certain attribute information of the current image frame. The embodiment of the present invention does not specifically limit the setting of the preset condition.

For example, since key frames are required for building a map, map points cannot be too dense or too sparse, and the preset conditions established according to the criteria may be:

(1) in the current image frame, the number of tracked feature points is small, for example, less than 50.

This condition describes a situation where the current image frame tracks a small number of feature points. I.e., the scene illustrating the current image frame, has changed significantly from the previous key frame. At this point, new key frames need to be inserted as soon as possible to avoid visual front end tracking failure.

(2) The feature points tracked in the current image frame, including the map points of the previous frame, are too few, typically set to a proportion less than the map points of the previous frame, for example 75%.

This condition describes the situation where there are fewer map points in the current image frame that are tracked into the previous frame key. I.e. it is said that at this time the camera has been significantly displaced from the insertion of the previous key frame, the insertion of a new key frame can be selected.

For example, since a map needs more continuous key frames, the interval for inserting key frames cannot be too long or too short, and the preset conditions established according to the criteria can be: the MAX frame is past from the last key frame inserted. Where MAX is a positive integer, which represents an upper limit of the number of interval frames into which the key frame is inserted, and this is not specifically limited in the embodiment of the present invention. For example, MAX may be set to 20 frames.

It should be noted that the first sliding window is an independently maintained sliding window, and only includes a key frame sequence corresponding to the carrier in the driving process of the preset distance. Illustratively, the first sliding window is established by a key frame contained by the carrier traveling 50 meters.

Specifically, in step 103, the visual SLAM device based on keyframe optimization determines the current image frame and preset conditions, and the determination results include two types: success of the determination and failure of the determination.

And then, matching and tracking the current key frame and the previous key frame of the current key frame in the first sliding window by using an optical flow method, and taking the feature point pair which can be matched and tracked as a preselected map point.

If the judgment fails, the current image frame does not meet the preset conditions, and the current image frame is not a key frame, the initial poses of all the image frames can be recurred in a chained manner by using the first pose information of the current image frame. And continuously reading a new image frame and executing the SLAM front-end processing process.

Wherein the preselected map points are used to create a map in the visual SLAM to characterize the surrounding environment.

It should be noted that the feature point matching algorithm between adjacent key frames is an algorithm other than the optical flow method, and this is not particularly limited in the embodiment of the present invention.

Preferably, the feature point matching algorithm between adjacent keyframes is a method based on feature point descriptor matching, similarity discrimination is performed by calculating feature descriptors of feature points to perform matching, and the accuracy of feature point tracking matching is higher than that of other algorithms, but is more time-consuming at the same time.

And 104, screening out target historical key frames from the first sliding window based on the map points as constraint conditions, and establishing a common-view directed connection graph based on the target historical key frames and the current key frames.

It should be noted that, in the first sliding window, the camera position and orientation information, that is, the rotation variation R and the translation variation t after the optimization of each historical key frame, at a plurality of historical key frame capture moments have been stored in advance.

Specifically, in step 104, the visual SLAM device based on keyframe optimization matches again between the keyframes in the first sliding window using orb-brief descriptor method, i.e. detects feature points by using FAST feature point detection method, and then selects N feature points with the highest Harris corner response value from the FAST feature points by using Harris corner metric method. And setting the preselected map points corresponding to the N matched feature point pairs as map points. And extracting all historical key frames capable of observing the map points from the first sliding window, and establishing a common-view directed connection graph required in graph optimization by combining the current key frames.

The embodiment of the present invention does not specifically limit the process of the common view directed join graph.

Exemplarily, different map points are used as class-A nodes to represent observed contents, and then different key keys in the first sliding window are used as class-B nodes to represent image acquisition devices with different poses at different shooting moments in the shooting process corresponding to the sliding window. Any A-type node can be connected with any B-type node, and the image acquisition device can observe the map point A under the condition that the edge formed by connection represents the key frame acquisition time corresponding to the B-type node.

If a certain class B node and a certain class A node can not be connected to form an edge, it indicates that the image acquisition device can not observe the map point A at the time of acquiring the key frame corresponding to the class B node. And when two B-class nodes and the same A-class node can be connected to form an edge, the key frames corresponding to the two B-class nodes have a common view relationship.

And 105, performing local clustering adjustment based on the common view directed connection graph and the first pose information to acquire pose estimation information.

It should be noted that the local Bundle Adjustment (BA) refers to adjusting the camera pose and the feature point position simultaneously, so that the light reflected from each feature point (Bundles of light rays) can finally pass through the camera optical center through the Adjustment (Adjustment). The BA is typically constructed as a least squares problem, adjusting both the pose of the camera and the coordinates of the feature points by minimizing the reprojection error.

Specifically, in step 105, the visual SLAM device based on keyframe optimization limits the optimization of BA within a dynamic first sliding window, and iteratively optimizes the current pose estimation information (i.e., the optimized rotation variation R and translation variation t) of the camera by adding the current keyframe into the first sliding window and marginalizing the old keyframe to minimize the reprojection error between two adjacent keyframes within the first sliding window.

And 106, performing local clustering adjustment based on the second sliding window and the pose estimation information to acquire second pose information and acquire updated map points, so that an SLAM scene map is established based on the updated map points.

It should be noted that the second sliding window is an independent maintenance window different from the first sliding window, and includes not only the key frame corresponding to the carrier in the driving process of the preset distance, but also the key frame in the historical duration before the carrier drives the preset distance. Illustratively, the second sliding window comprises two parts, the former part is established by the key frames contained in the first 10 seconds corresponding to the period of 50 meters for the carrier to travel, and the latter part is established by the key frames contained in the 50 meters for the carrier to travel.

Specifically, in step 106, the visual SLAM device based on the keyframe optimization establishes a global constraint with a larger time and space span by using the relationship between the keyframes stored in the second sliding window, performs local Bundle Adjustment (BA) to minimize a reprojection error, optimizes the latest pose estimation information based on the principle of a smooth historical trajectory, and can acquire the second pose information.

Exemplarily, fig. 2 is a schematic flow chart of hierarchical bundle adjustment provided by the present invention. As shown in fig. 2, the first BA estimates the latest pose only for the current keyframe in the first sliding window, and the second BA expands the number of keyframes included in the sliding window from two dimensions of travel distance and time, so as to be able to seamlessly connect with the historical track, update and optimize the latest pose and map points obtained by the current keyframe, so as to achieve the purpose of reducing or eliminating the SLAM accumulated error, and further, establish the SLAM scene map by using the updated map points.

In the prior art, usually, a descriptor is used for matching feature points in common interframe estimation, so that the accuracy of an initial pose can be improved to a certain extent, and further, when a sliding window is used for performing SLAM pose optimization on a key frame, due to the fact that the size of the sliding window is fixed, local trajectories can not be fully explored due to different vehicle running conditions (mainly reflected in vehicle speed), a large amount of calculation power is input in an image matching strategy, but visual pose estimation in a part of scenes is degraded due to discontinuous exploration trajectories.

The embodiment of the invention adopts a general optical flow method to process common frames to finish the initial chain type pose calculation, but the method is rough, but saves a great amount of computing power in the SLAM front end process. Furthermore, the accuracy of map points is refined by processing the key frame images in a orb descriptor-based matching mode, and hierarchical clustering adjustment is performed by adopting two dynamic sliding windows when the key frame pose is optimized, so that the pose optimization accuracy is improved, the exploration track is ensured to be smooth, and the stability is improved.

Compared with the method based on the feature point descriptor used in the prior art, the method based on the feature point descriptor has low precision, but the method based on the feature point descriptor is approximately within 29 microseconds (ms) to process a frame, and the method based on the feature point descriptor is approximately 51 microseconds (ms), so that the calculation speed is at least doubled. And because there is a means to optimize the key frame at the back end of SLAM, the overall precision can be equivalent or even improved in some regions with repeated environmental textures, the most obvious advantage is that the high precision can be ensured on the premise of saving a large amount of computing resources in the process of the front end of SLAM, so that the method can be applied to some platforms with limited computing power, and the application limit of SLAM is broken through to a certain extent.

The embodiment of the invention carries out tracking matching based on an optical flow method, first position information is pushed out in a chained mode according to the matching relation between a current image frame and an adjacent common frame, a current key frame is extracted from the current image frame through a preset condition, tracking matching is carried out on the current key frame and the adjacent key frame in a sliding window by using a descriptor method, a map point is obtained, hierarchical gradual local clustering adjustment is carried out through the map point, and second position information and related map points after the current key frame is smoothed are obtained. Certain precision can be sacrificed in SLAM front-end processing to obtain low calculation force, the latest pose information can be obtained by using the BA of the first level in SLAM rear-end processing, the smoothness of a new track point in a motion track can be ensured by using the BA of the second level to be seamlessly connected with a historical track, and the precision of improving the position and pose optimization of a key frame in SLAM rear-end processing is improved.

On the basis of any of the above embodiments, based on a current image frame and a previous image frame of the current image frame, performing matching by using an optical flow method to obtain first pose information, including: and preprocessing the current image frame, and screening out a target feature point from the last image frame of the current image frame.

Specifically, in step 102, the keyframe-based optimized visual SLAM device preprocesses the current image frame according to a preset requirement, and selects a target feature point from the previous image frame of the current image frame.

The preprocessing refers to processing performed before feature point tracking and matching are performed on an original current image frame. Preprocessing the information for removing distortion in the current image frame so that the generated current image frame contains useful real information, enhances the detectability of the relevant information and simplifies the data to the maximum extent.

The current image frame uses pre-processing methods including, but not limited to, pixel intensity transformation, geometric transformation, smoothing or edge detection, etc. such that the two image frames being tracked in the image sequence have consistent image quality. The embodiment of the present invention does not specifically limit the pretreatment.

The embodiment of the present invention does not specifically limit the manner of extracting the target feature point.

For example, the feature point of the target may be extracted from the previous image frame of the current image frame by using a corner detector for matching and tracking by sparse optical flow method.

For example, the target feature points may be uniformly extracted at certain intervals in the previous image frame of the current image frame to perform matching tracking by the dense optical flow method.

And matching the current frame and the previous frame of the current frame by utilizing a pyramid optical flow method based on the target feature points to obtain a matching point set.

Specifically, the visual SLAM device based on key frame optimization constructs a window gray level error function with target feature point offset as an optimization quantity according to a target feature point initial value, estimates the offset by using a pyramid optical flow method, and obtains a matching point set of two adjacent common image frames.

The embodiment of the present invention does not specifically limit the pyramid structure and the specific implementation process of the pyramid optical flow method.

Preferably, an eight-layer pyramid optical flow method is adopted for feature point matching, points corresponding to the target feature points are matched in the current image frame, and the two groups of points form a matching point set. The specific implementation process is as follows:

(1) according to the coordinates p of a certain target feature point of an input 0-layer image (namely the last image frame of the current image frame), the coordinates corresponding to the other seven-layer pyramid feature points are calculated: p is a radical of formula ₁ 、p ₂ 、p ₃ 、p ₄ 、p ₅ 、p ₆ And p ₇ 。

(3) Computing an initial optical flow coordinate m ₀ 、m ₁ 、m ₂ 、m ₃ 、m ₄ 、m ₅ 、m ₆ And m ₇ Wherein m is ₀ ＝p，m ₁ ＝p ₁ ，m ₂ ＝p ₂ ，m ₃ ＝p ₃ ，m ₄ ＝p ₄ ，m ₅ ＝p ₅ ，m ₆ ＝p ₆ And m ₇ ＝p ₇ 。

(4) In m ₇ Determining the optical flow end point n on the 7 th layer as an input ₇ 。

(5) Calculate n ₇ Corresponding coordinate n at layer 6 ₆ In n is ₆ AsInput determination of optical flow end point q on layer 6 ₆ . And so on until q is calculated ₁ Corresponding coordinate q at layer 0 ₀ With q ₀ The optical flow end point q on layer 0 is found as an input, and each group (p, q) is added to the set of matching points.

It can be understood that after the matching point set is obtained, the abnormal point needs to be removed, and the removing method of the matching point set includes, but is not limited to, an algorithm for removing a mismatching point, such as a Random Sample Consensus algorithm (RANSAC), an M-estimator Sample Consensus algorithm (MSAC), or a Least mean of Squares (LMedS), which is not particularly limited in this embodiment of the present invention.

Preferably, the tracking result sequentially uses a Grid-based Motion Statistics Fast robust Feature matching filtering algorithm (GMS) for Fast, Ultra-robust Feature registration, epipolar constraint and RANSAC method to filter abnormal points, so as to obtain a more accurate matching point set.

And performing pose calculation of six degrees of freedom based on the matching point set to acquire first pose information.

Specifically, in step 102, the keyframe-based optimized visual SLAM device performs six-degree-of-freedom pose solution on the filtered set of matching points, and solves the relative pose change between the current image frame and the previous image frame in the current image frame, that is, the rotation change amount R and the translation change amount t in the first pose information, to complete the chain-type recursive pose.

The method and the device for tracking and matching the image frames based on the pyramid optical flow method are used for tracking and matching, the matching point set between the current image frame and the adjacent common frame is obtained, and the first pose information is deduced in a chained mode through pose calculation of the matching point set. Further, the current key frame is locally optimized by using a descriptor method. The calculation time can be saved in SLAM front-end processing, certain precision is sacrificed at the front end to obtain low calculation force, the precision of the key frame posture optimization is improved in SLAM rear-end processing, and the low calculation force and the high precision of the whole are considered.

On the basis of any one of the above embodiments, the performing local bundle adjustment based on the common-view directed connection graph and the first pose information to obtain the pose estimation information includes: and performing fusion operation by using the current key frame and each historical key frame in the common-view directed connected graph, and screening out target map points from the map points.

Specifically, the visual SLAM device based on the keyframe optimization removes the map points determined by the current keyframe in the second matching according to the common-view relationship contained in the common-view directed connectivity graph and the corresponding criteria, and performs the fusion operation according to the remaining map points after the removal and the target historical keyframe with better common-view relationship, so as to classify the same feature points as the target map points.

The criterion for removing the map points of the current key frame is to ensure that the observability is consistent, and the map points are not observed by continuous and reliable key frames from the creation of the map points.

Alternatively, the culling criterion may be that no less than 2 key frames have elapsed since the map point was created, but the number of key frames that the map point was observed on is less than the threshold 2.

Alternatively, the culling criteria may also be that a certain number of key frames have been exceeded since the map point was created.

Preferably, the feature points of the current key frame image meeting the two criteria are fused with the key frame image with a better common view relationship, the 3D coordinates are recovered by using Direct Linear Transformation (DLT), and then the map points and the relationship between the key frames are updated.

Specifically, the visual SLAM device based on keyframe optimization performs local graph optimization on a common view directed link graph by using first position and orientation information and target map points and adopting local clustering adjustment, wherein the positions of keyframes only update the latest keyframes, historical keyframes are not updated, and meanwhile, all keyframes in a first sliding window provide observation constraints for the map points and update the target map points together so as to weaken or eliminate SLAM positioning accumulated errors.

According to the method and the device, the target map points are screened out based on the common view directed connection graph, local clustering adjustment is conducted on the common view directed connection graph through the first pose information and the target map points, pose estimation information is obtained, and the target map points are updated. The accuracy of the key frame position and attitude optimization can be improved in SLAM back-end processing, graph optimization is carried out by using local BA, and the location accumulated error of the SLAM is weakened or eliminated. Further, the overall low calculation effort and high accuracy are simultaneously achieved.

On the basis of any of the above embodiments, preprocessing a current image frame, and screening out a target feature point from a previous image frame of the current image frame includes: and carrying out histogram equalization on the current image frame to obtain the processed current image frame.

Specifically, in step 102, the visual SLAM device based on keyframe optimization performs histogram equalization on the current image frame to enhance the contrast of the current image frame and make it clear.

And based on the target model, carrying out motion estimation on the feature point of the last image frame of the current image frame to obtain a target feature point.

Specifically, the visual SLAM device based on the keyframe optimization processes feature points in a previous image frame of the current image frame by using a target model, estimates position coordinates of the feature points in the previous image frame in the current image frame, and uses the position coordinates as target feature points, so that the visual SLAM device based on the keyframe optimization performs the pyramid optical flow method to perform tracking matching of the common frame.

The embodiment of the invention is based on histogram equalization of the current image frame and passes through a target model. The accuracy of the key frame position and attitude optimization can be improved in SLAM back-end processing, graph optimization is carried out by using local BA, and the location accumulated error of the SLAM is weakened or eliminated. Furthermore, the overall low computational effort and high accuracy are simultaneously achieved.

On the basis of any one of the above embodiments, the target model includes one of a uniform velocity model, and a variable velocity model.

Specifically, in step 102, the visual SLAM device based on keyframe optimization estimates the position coordinates of the feature points in the previous image frame of the current image frame by using the target model for the feature points in the previous image frame, and constructs a window gray level error function with the feature point offset as an optimization quantity by using the estimated position coordinates as an initial value, so that the offset is estimated by using 8-layer pyramid optical flow to obtain target feature point matching of two adjacent frames of images.

The embodiment of the present invention does not specifically limit the type of the target model.

Alternatively, the target model may be a uniform speed change model, and the position coordinates of the feature points of the current image frame predicted to be shifted under the effect of the uniform speed change in the application process of the model are used as initial values.

Optionally, the target model may be a constant velocity model, and the position coordinates of the feature points of the current image frame, which are predicted to be displaced under the action of the constant velocity in the application process of the model, are used as initial values.

Alternatively, the target model may be a variable speed model, and the feature point of the current image frame is predicted in the application process of the model, and the position coordinate after displacement under the action of variable speed is taken as an initial value.

Exemplarily, fig. 3 is a schematic structural diagram of a ramp model provided by the present invention. As shown in fig. 3, an implementation using a uniform shift model is given:

p ^* ＝sKT _cw ^* P

wherein the content of the first and second substances,

indicating a pose transformation of frame w relative to frame w-1 in the camera coordinate system (i.e., the subscripts are all carried as c) and treating frame w as the current image frame.

And the predicted value of the pose transformation of the w-th frame is obtained. s is the scale factor of the camera, K is the internal reference matrix of the camera (generally composed of the focal length and distortion factor of the camera), P is the coordinates of the pixel points before motion estimation, and P is ^* And the pixel point coordinates after motion estimation.

Wherein, T _cw-1 The pose information of the w-1 frame is represented by the following expression:

wherein, R' is a direction cosine matrix of the w-1 th frame for representing the attitude, and the size of the matrix is 3 x 3. t' is the position vector of the w-1 th frame, representing the position, with a size of 3 x 1.

The embodiment of the invention gives more accurate initial values to the pyramid optical flow method based on the uniform speed change model, reduces the search range of the target feature points and can improve the processing efficiency of the pyramid optical flow method.

On the basis of any one of the above embodiments, the method further comprises: adding new current key frames in the first sliding window or the second sliding window, and marginalizing historical key frames.

Specifically, the first sliding window or the second sliding window in the visual SLAM device based on key frame optimization is limited by a fixed number, so that a new current key frame is added to any one sliding window in real time, and a historical key frame is removed, so that the computational power consumption of a SLAM system is avoided.

The criterion for adding the current key frame into any one of the sliding windows is to judge any current image frame according to a preset condition, if the current image frame meets the preset condition, the current key frame is added into the sliding window, and if the current image frame does not meet the preset condition, the next image frame is continuously read to carry out the judging process.

The criterion for removing the current key frame from the sliding window is that if 90% of the feature points in the historical key frames are all viewed by the continuous 3 frames of historical key frames, the current key frames need to be removed.

And under the condition that the loop detection result of the current key frame is true, utilizing closed loop constraint to carry out loop correction.

Specifically, the visual SLAM device based on keyframe optimization performs loop detection on the current keyframe, and the loop detection result includes: a loopback detection logic true and a loopback detection logic false, wherein:

if the loop detection logic is true and a circle (return to the previous return position) exists in the motion route of the visual SLAM device based on the key frame optimization, the error is eliminated in a closed-loop constraint mode, and loop correction is performed.

If the loop detection logic is false, which is the case where there is no winding (returning to the previous returning position) in the motion route of the visual SLAM device based on the key frame optimization, no processing is required.

Fig. 4 is a schematic structural diagram of a visual SLAM device based on key frame optimization according to the present invention. On the basis of any of the above embodiments, as shown in fig. 4, the apparatus includes a receiving module 410, an initial pose acquisition module 420, a key frame matching module 430, a common view relationship establishing module 440, a first bundling module 450, and a second bundling module 460, where:

a receiving module 410, configured to receive a current image frame from an image capturing apparatus.

The initial pose acquisition module 420 is configured to perform matching by using an optical flow method based on the current image frame and a previous image frame of the current image frame, and acquire first pose information.

And a key frame matching module 430, configured to, when the current image frame meets a preset condition, add the current image frame as a current key frame to the sliding window, and perform matching between the current key frame and a previous key frame of the current key frame by using a descriptor-based matching method, so as to obtain a map point.

And a common-view relationship establishing module 440, configured to screen out a target history key frame from the first sliding window based on the map point as a constraint condition, and establish a common-view directed connected graph based on the target history key frame and the current key frame.

A first clustering module 450, configured to perform local clustering adjustment based on the common view directed connection graph and the first pose information, and acquire the pose estimation information.

And a second clustering module 460, configured to perform local clustering adjustment based on a second sliding window and the pose estimation information, acquire second pose information, and acquire updated map points, so that a SLAM scene map is established based on the updated map points.

Specifically, the receiving module 410, the initial pose acquiring module 420, the key frame matching module 430, the common view relationship establishing module 440, the first bundling module 450, and the second bundling module 460 are electrically connected in sequence.

The first receiving module 410 receives a continuous image sequence transmitted by the image capturing device in real time, and reads the latest image frame as the current image frame for processing.

The initial pose acquisition module 420 extracts feature points existing in a previous image frame of the current image frame, performs matching tracking on the feature points with the current image frame, and may calculate first pose information corresponding to the current image frame by using pixel coordinate information of the matched feature points.

The key frame matching module 430 determines the current image frame and preset conditions, and the determination results include two types: success of the determination and failure of the determination.

And if the current image frame is judged to meet the preset condition successfully, the current image frame is taken as a current key frame and stored in the sliding window in sequence, preselected feature points are collected from the current key frame and matched and tracked with the previous key frame of the current key frame in the sliding window, and the preselected feature points which can be tracked are taken as map points.

The co-view relationship establishing module 440 performs matching again between the key frames in the first sliding window by using the orb-brief descriptor method, that is, detects feature points by using the FAST feature point detection method, and then selects N feature points with the highest Harris corner response value from the FAST feature points by using the Harris corner measurement method. And setting the preselected map points corresponding to the N matched feature point pairs as map points. And extracting all historical key frames capable of observing the map points from the first sliding window, and establishing a common-view directed connection graph required in graph optimization by combining the current key frames.

The first clustering module 450 limits the optimization of BA in a dynamic first sliding window, and iteratively optimizes the current pose estimation information (i.e., the optimized rotation variation R and translation variation t) of the camera with the goal of minimizing the reprojection error between two adjacent keyframes in the first sliding window after adding the current keyframe in the first sliding window and marginalizing the old keyframe.

The second clustering module 460 builds global constraints with larger time and space span by using the relationship between the keyframes stored in the second sliding window, performs local clustering Adjustment (BA) to minimize the reprojection error, optimizes the latest pose estimation information based on the principle of smooth historical track, can acquire the second pose information, updates and optimizes only the pose of the current keyframe and the map point to achieve the purpose of reducing or eliminating the SLAM accumulated error, and further builds a SLAM scene map by using the updated map point

Optionally, the initial pose acquisition module 420 includes an image preprocessing unit, a common frame matching unit, and a pose resolving unit, where:

and the image preprocessing unit is used for preprocessing the current image frame and screening out the target characteristic point from the last image frame of the current image frame.

And the common frame matching unit is used for matching the current image frame and the previous image frame of the current image frame by utilizing a pyramid optical flow method based on the target feature points to obtain a matching point set.

And the pose resolving unit is used for resolving the pose with six degrees of freedom based on the matching point set to acquire first pose information.

Optionally, the first bundling module 450 includes a map point screening unit and an initial pose acquiring unit, where:

and the map point screening unit is used for performing fusion operation by using the current key frame and each historical key frame in the common-view directed connected graph to screen out target map points from the map points.

And the initial pose acquisition unit is used for carrying out local clustering adjustment on the basis of the first pose information and the target map point, acquiring pose estimation information and updating the target map point.

Optionally, the image preprocessing unit includes an equalization subunit and a feature point preprocessing subunit, wherein:

and the equalization subunit is used for performing histogram equalization on the current image frame and acquiring the processed current image frame.

A feature point preprocessing subunit, configured to perform motion estimation on feature points of a previous image frame of the current image frame based on the target model to obtain target feature points

Optionally, the target model comprises one of a uniform velocity model, a uniform velocity model and a variable velocity model.

Optionally, the apparatus further comprises a sliding window maintenance module and a loop back correction module, wherein:

and the sliding window maintenance module is used for adding a new current key frame in the first sliding window or the second sliding window and marginalizing the historical key frame.

And the loop correction module is used for performing loop correction by using closed-loop constraint under the condition that the loop detection result of the current key frame is true.

The visual SLAM device based on key frame optimization provided by the embodiment of the invention is used for executing the visual SLAM method based on key frame optimization provided by the invention, the implementation mode of the visual SLAM device based on key frame optimization provided by the invention is consistent with that of the visual SLAM method based on key frame optimization provided by the invention, and the same beneficial effects can be achieved, and the detailed description is omitted here.

The method comprises the steps of tracking and matching based on an optical flow method, pushing out first position and attitude information in a chain-type manner according to the matching relation between a current image frame and adjacent common frames, extracting a current key frame from the current image frame through a preset condition, tracking and matching the current key frame and the adjacent key frames in a sliding window by using a descriptor method to obtain map points, and performing hierarchical gradual local clustering adjustment through the map points to obtain second position and attitude information and related map points of the current key frame after smoothing. Certain precision can be sacrificed in SLAM front-end processing to obtain low calculation force, the latest pose information can be obtained by using the BA of the first level in SLAM rear-end processing, the smoothness of a new track point in a motion track can be ensured by using the BA of the second level to be seamlessly connected with a historical track, and the precision of improving the position and pose optimization of a key frame in SLAM rear-end processing is improved.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a keyframe optimization-based visual SLAM method comprising: receiving a current image frame from an image acquisition device; matching the current image frame and the last image frame of the current image frame by using an optical flow method to acquire first position and attitude information; under the condition that the current image frame meets a preset condition, adding the current image frame serving as a current key frame to a first sliding window, and matching between the current key frame and a previous key frame of the current key frame by using a descriptor-based matching method to obtain a map point; based on the map points as constraint conditions, screening out target historical key frames from the first sliding window, and establishing a common-view directed connected graph based on the target historical key frames and the current key frames; based on the common view directed connection graph and the first pose information, performing local clustering adjustment to acquire pose estimation information; based on a second sliding window and the pose estimation information, performing local clustering adjustment to obtain second pose information and obtain updated map points, so that an SLAM scene map is established based on the updated map points; the first sliding window is used for storing key frames contained in a carrier in a driving process of a preset distance, the second sliding window is used for storing key frames contained in a preset historical duration and key frames contained in the carrier in the driving process of the preset distance, and the second pose information is pose information after the current key frames are optimized.

In addition, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can execute the method for optimizing a visual SLAM based on keyframe provided by the above methods, the method including: receiving a current image frame from an image acquisition device; matching the current image frame and the last image frame of the current image frame by using an optical flow method to acquire first position and attitude information; under the condition that the current image frame meets a preset condition, adding the current image frame serving as a current key frame to a first sliding window, and matching between the current key frame and a previous key frame of the current key frame by using a descriptor-based matching method to obtain a map point; based on the map points as constraint conditions, screening out target historical key frames from the first sliding window, and establishing a common-view directed connected graph based on the target historical key frames and the current key frames; based on the common view directed connection graph and the first pose information, performing local clustering adjustment to acquire pose estimation information; based on a second sliding window and the pose estimation information, performing local clustering adjustment to obtain second pose information and obtain updated map points, so that an SLAM scene map is established based on the updated map points; the first sliding window is used for storing key frames contained in a carrier in a driving process of a preset distance, the second sliding window is used for storing key frames contained in a preset historical duration and key frames contained in the carrier in the driving process of the preset distance, and the second pose information is pose information after the current key frames are optimized.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the method for keyframe optimization-based visual SLAM provided by the above methods, the method comprising: receiving a current image frame from an image acquisition device; based on the current image frame and a previous image frame of the current image frame, matching by using an optical flow method to acquire first attitude information; under the condition that the current image frame meets a preset condition, adding the current image frame serving as a current key frame to a first sliding window, and matching between the current key frame and a previous key frame of the current key frame by using a descriptor-based matching method to obtain a map point; based on the map points as constraint conditions, screening out target historical key frames from the first sliding window, and establishing a common-view directed connected graph based on the target historical key frames and the current key frames; based on the common view directed connection graph and the first pose information, performing local clustering adjustment to acquire pose estimation information; performing local cluster adjustment based on a second sliding window and the pose estimation information to acquire second pose information and updated map points so as to establish an SLAM scene map based on the updated map points; the first sliding window is used for storing key frames contained in a carrier in a driving process of a preset distance, the second sliding window is used for storing key frames contained in a preset historical duration and key frames contained in the carrier in the driving process of the preset distance, and the second pose information is pose information after the current key frames are optimized.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A visual SLAM method based on keyframe optimization, comprising:

receiving a current image frame from an image acquisition device;

based on the common view directed connection graph and the first position and orientation information, local cluster adjustment is carried out, and the position and orientation estimation information is obtained;

2. The visual SLAM method based on keyframe optimization as recited in claim 1, wherein said obtaining the first pose information based on said current image frame and a previous image frame of said current image frame by optical flow matching comprises:

3. The visual SLAM method based on keyframe optimization of claim 1, wherein the performing local bundle adjustment based on the common view directed join graph and the first pose information to obtain the pose estimation information comprises:

and performing local clustering adjustment based on the first position and orientation information and the target map point, acquiring position and orientation estimation information, and updating the target map point.

4. The method of claim 2, wherein the pre-processing the current image frame and screening out target feature points from a previous image frame of the current image frame comprises:

5. The visual SLAM method based on keyframe optimization as defined in claim 4, wherein the objective model comprises one of a ramp model, a uniform model, and a shift model.

6. The visual SLAM method based on keyframe optimization of claim 1, further comprising:

and under the condition that the loop detection result of the current key frame is true, loop correction is carried out by utilizing closed loop constraint.

7. A visual SLAM apparatus based on keyframe optimization, comprising:

the first clustering module is used for carrying out local clustering adjustment based on the common view directed connection graph and the first pose information to acquire pose estimation information;

the first sliding window is used for storing key frames contained in a carrier in a driving process of a preset distance, the second sliding window is used for storing key frames contained in a preset historical duration and key frames contained in the carrier in the driving process of the preset distance, and the second pose information is pose information after the current key frame is optimized.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the keyframe optimization based visual SLAM method as recited in any one of claims 1 to 6.

9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the keyframe optimization-based visual SLAM method as recited in any one of claims 1 to 6.

10. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the steps of the key frame optimization based visual SLAM method of any of claims 1 to 6.