CN112219087A

CN112219087A - Pose prediction method, map construction method, movable platform and storage medium

Info

Publication number: CN112219087A
Application number: CN201980033455.4A
Authority: CN
Inventors: 朱张豪
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd; SZ DJI Innovations Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2021-01-12
Also published as: WO2021035669A1

Abstract

A position and orientation predicting method, a map constructing method, a movable platform and a storage medium, a first time period is obtained according to a time stamp of a current image frame and a time stamp of a last image frame (S101); acquiring measurement data of a first encoder and measurement data of a second encoder in the first time period (S102); obtaining a predictive transformation matrix of a current image frame relative to a previous image frame according to the measurement data of the first encoder and the measurement data of the second encoder in the first time period (S103); and obtaining the predicted pose of the current image frame according to the predicted change matrix and the pose information of the previous image frame (S104). By applying the encoder to the visual SLAM and predicting the pose of the current image frame through the data measured by the encoder, the pose acquisition accuracy can be improved, the stability and robustness of the visual SLAM are improved, and the visual SLAM can still stably run when the vision fails.

Description

Pose prediction method, map construction method, movable platform and storage medium

Technical Field

The embodiment of the invention relates to the field of machine vision, in particular to a pose prediction method, a map construction method, a movable platform and a storage medium.

Background

Visual SLAM (Simultaneous Localization and Mapping, instantaneous Localization and Mapping) refers to a process in which a robot creates a map according to an image acquired by a camera in a completely unknown environment under the condition that the position of the robot is uncertain, and performs autonomous Localization and navigation by using the map.

When special problems such as few characteristic points (no texture) in an image, too many repetition with a previous image frame, blurring caused by high-speed motion, no overlapping area, accidental reduction of a camera frame rate and the like occur, the tracking function of the vision SLAM is disabled, that is, the relative pose of the current image frame relative to the previous image frame cannot be obtained, so that the vision SLAM cannot continue to work, and the vision can be recovered only by a repositioning mode when a closed loop is triggered.

The existing visual SLAM method is poor in stability and robustness, and vision can be recovered only when a closed loop is required when visual failure occurs, so that the visual SLAM is interrupted when the visual failure occurs, map points and tracks in the visual failure process are lost, the map points and the tracks before and after the visual failure cannot be connected, and the visual SLAM cannot stably run.

Disclosure of Invention

The embodiment of the invention provides a pose prediction method, a map construction method, a movable platform and a storage medium, which are used for improving the accuracy of pose acquisition of a current image frame, improving the stability and robustness of a visual SLAM and ensuring that the visual SLAM can still stably run when the vision fails.

The first aspect of the embodiments of the present invention provides a pose prediction method, which is applicable to a movable platform, where the movable platform includes a first rolling wheel, a second rolling wheel, a camera, a first encoder, and a second encoder, where the first rolling wheel and the second rolling wheel are used to displace the movable platform, and axes of the first rolling wheel and the second rolling wheel are coincident; the camera is used for collecting image frames, the first encoder is used for detecting the speed of the first rolling wheel, and the second encoder is used for detecting the speed of the second rolling wheel; the method comprises the following steps:

acquiring a first time period according to the time stamp of the current image frame and the time stamp of the last image frame;

acquiring measurement data of a first encoder and measurement data of a second encoder in the first time period;

obtaining a prediction transformation matrix of the current image frame relative to the previous image frame according to the measurement data of the first encoder and the measurement data of the second encoder in the first time period;

and obtaining the predicted pose of the current image frame according to the predicted change matrix and the pose information of the previous image frame.

A second aspect of an embodiment of the present invention provides a movable platform, including: the device comprises a first rolling wheel, a second rolling wheel, a camera, a first encoder, a second encoder, a memory and a processor;

the first rolling wheel and the second rolling wheel are used for enabling the movable platform to displace, and the axes of the first rolling wheel and the second rolling wheel are overlapped;

the camera is used for collecting image frames;

the first encoder is configured to detect a velocity of the first scroll wheel, and the second encoder is configured to detect a velocity of the second scroll wheel;

the memory is used for storing program codes;

the processor, invoking the program code, when executed, is configured to:

A third aspect of embodiments of the present invention is to provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect.

A fourth aspect of the embodiments of the present invention is to provide a map construction method, which is applicable to a movable platform, where the movable platform includes a first rolling wheel, a second rolling wheel, a camera, a first encoder, and a second encoder, where the first rolling wheel and the second rolling wheel are used to displace the movable platform, and axes of the first rolling wheel and the second rolling wheel are overlapped; the camera is used for collecting image frames, the first encoder is used for detecting the speed of the first rolling wheel, and the second encoder is used for detecting the speed of the second rolling wheel; the method comprises the following steps:

obtaining the predicted pose of the current image frame according to the predicted change matrix and the pose information of the previous image frame;

after the predicted pose of the current image frame is obtained, on the basis of a constructed map, updating the constructed map according to the current image frame; the constructed map is constructed from a keyframe determined prior to the current image frame.

A fifth aspect of an embodiment of the present invention provides a movable platform, including: the device comprises a first rolling wheel, a second rolling wheel, a camera, a first encoder, a second encoder, a memory and a processor;

the camera is used for collecting image frames;

the memory is used for storing program codes;

the processor, invoking the program code, when executed, is configured to:

A sixth aspect of embodiments of the present invention is to provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the fourth aspect.

According to the pose prediction method, the map construction method, the movable platform and the storage medium, the first time period is obtained according to the time stamp of the current image frame and the time stamp of the last image frame; acquiring measurement data of a first encoder and measurement data of a second encoder in the first time period; obtaining a prediction transformation matrix of the current image frame relative to the previous image frame according to the measurement data of the first encoder and the measurement data of the second encoder in the first time period; and obtaining the predicted pose of the current image frame according to the predicted change matrix and the pose information of the previous image frame. According to the embodiment of the invention, the encoder is applied to the visual SLAM, and the pose of the current image frame is predicted through the data measured by the encoder, so that the pose acquisition accuracy can be improved, the stability and the robustness of the visual SLAM are improved, and the visual SLAM can still stably run when the vision fails.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flowchart of a map construction method according to an embodiment of the present invention;

fig. 2 is a flowchart of a pose prediction method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a mapping method according to another embodiment of the invention;

FIG. 4 is a flowchart of a mapping method according to another embodiment of the invention;

FIG. 5 is a flowchart of a mapping method according to another embodiment of the invention;

FIG. 6 is a flowchart of a mapping method according to another embodiment of the invention;

fig. 7 is a schematic structural diagram of a movable platform according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

The embodiment of the invention provides a map construction method. The map building method is applied to a movable platform, which may be a robot, an unmanned vehicle, or the like, as shown in fig. 7, the movable platform 700 specifically includes a first scroll wheel 701, a second scroll wheel 702, a first encoder 703, a second encoder 704, a camera 705, and further may further include a processor 706 and a memory 707. Wherein the first scroll wheel 701 and the second scroll wheel 702 are used for displacing the movable platform, and the axes of the first scroll wheel 701 and the second scroll wheel 702 are coincident; the camera 705 is configured to capture image frames, the first encoder 703 is configured to detect a velocity of the first scroll wheel 701, and the second encoder 704 is configured to detect a velocity of the second scroll wheel 702.

The camera 705 may adopt a monocular camera, a binocular camera, an RGBD depth camera, etc., and when the camera 705 adopts a monocular camera, the camera is usually used in close coupling cooperation with an Inertial Measurement Unit (IMU) to determine a map scale, where the IMU is a Measurement Unit integrating two types of sensors, i.e., a Gyroscope (gyro) for measuring three-dimensional rotation and an Accelerometer (Accelerometer) for measuring three-dimensional acceleration; of course, the camera 705 may be used in cooperation with an IMU when using a binocular camera or an RGBD depth camera, which may further improve accuracy, but use the IMU with higher requirements for sensor calibration. An encoder (encoder) is an angular displacement sensor, measures the speed of a wheel by detecting the number of radians rotated by the wheel of a robot within a certain time, and is mainly divided into a photoelectric type, a contact type and an electromagnetic type.

As shown in fig. 1, the map construction method according to the embodiment of the present invention may be divided into a tracking thread S11, a local mapping thread S12, and a loop detection thread S13, where the tracking thread S11 includes predicting the pose of the image frame, and as shown in fig. 2, the predicting the pose of the image frame specifically includes:

step S101, acquiring a first time period according to the time stamp of the current image frame and the time stamp of the last image frame.

After any image frame is acquired, the image may be first preprocessed, and the preprocessing process specifically includes:

for the RGBD camera, the collected images comprise an RGB image and a depth image, the GRB image can be converted into a gray image, the characteristic points of the gray image are extracted, the three-dimensional coordinates of the characteristic points are calculated according to the gray image and the depth image, namely the coordinates of the characteristic points relative to a false right camera, and the images collected by the RGBD camera are equivalent to the images collected by a binocular camera.

And for the binocular camera, converting the acquired image into a gray map, extracting feature points from the gray map, and calculating the depth of the feature points according to the matching of the feature points in the images of the left camera and the right camera to obtain the three-dimensional coordinates of the feature points.

It should be noted that, the extracted feature points in this embodiment may be ORB (organized brief) feature points, and the ORB feature is a rotation invariance feature, which has high stability, is insensitive to illumination and moving objects, and has a small calculation amount, and can also perform real-time processing only with the CPU of a conventional notebook computer; of course, the Feature points may be Harris corner points, SIFT (Scale-innovative Feature Transform) Feature points, surf (speeded Up Robust features), and the like.

For the application of the feature points, it will be described in detail below.

And step S102, acquiring the measurement data of the first encoder and the measurement data of the second encoder in the first time period.

Preferably, after the measurement data of the first encoder or the second encoder is acquired, it may be determined whether the measurement data is available, specifically as follows:

caching the measured data into a list, and recording the acquisition time corresponding to the measured data;

obtaining the measurement data closest to the current image frame acquisition time from a buffer list, and obtaining the time error between the acquisition time of the measurement data and the current image frame acquisition time;

and if the time error exceeds a preset time threshold, indicating that the measurement data is unavailable.

In this embodiment, the acquisition frequency of the encoder measurement is usually higher, for example, 200Hz, while the acquisition frequency of the image is usually lower, for example, 30Hz, and it is generally impossible to find the encoder measurement data that is acquired at the same time as the image, so that the encoder measurement data closest to the image acquisition time can be searched, that is, the time error between the acquisition time of the encoder measurement data and the acquisition time of the current image frame does not exceed the preset time threshold. When the system is unstable, encoder measurement data may be missing, that is, encoder data whose time error with the current image frame acquisition time does not exceed a preset time threshold cannot be found, that is, encoder measurement data before and after the current image frame acquisition time is unreliable, and the unreliable encoder measurement data can be discarded, and only a pure visual SLAM in the prior art (that is, pose prediction and map construction are performed only by means of vision) is adopted. Similarly, a similar determination process may be used for the measurement data of the IMU.

And step S103, obtaining a predictive transformation matrix of the current image frame relative to the previous image frame according to the measurement data of the first encoder and the measurement data of the second encoder in the first time period.

And step S104, obtaining the predicted pose of the current image frame according to the predicted change matrix and the pose information of the previous image frame.

In this embodiment, the pose of the camera of the current image frame may be predicted according to the encoder data, so as to obtain the predicted pose of the current image frame. Specifically, it is assumed that the motion of the movable platform is a two-dimensional motion (plane or curved surface) and the ground has enough friction, wherein the curved surface can be approximated to a plane in a short time by using the idea similar to taylor expansion, i.e. the frame rate is high enough to handle the curved surface motion.

Obtaining a predictive transform matrix T from encoder data₁₀Wherein the transformation matrix T is predicted₁₀The three-dimensional translation vector comprises a 3 multiplied by 3 rotation matrix R and a three-dimensional translation vector t, wherein 1 represents a current image frame camera coordinate system, and 0 represents a previous image frame camera coordinate system; pose T combined with previous image frame_0w(w is a world coordinate system), and the predicted pose T of the current image frame can be obtained_1w＝T₁₀T_0w。

In the present embodiment, the predictive transform matrix T may be derived by a planar motion model₁₀Such as a model of the motion of a differential two-wheeled vehicle.

Measured velocity v of a current image frame of a movable platform_EEAnd measuring angular velocity omega_EEAs shown in equation (1):

wherein the velocity v is measured_EEX-axis component v and the measured angular velocity ω_EEThe z-axis component ω of (a) satisfies the motion model of the differential two-wheel vehicle, as shown in equation (2) (where E represents the current image frame encoder coordinate system, with the center of the two-wheel track as the origin, and the x-axis pointing forward):

wherein the content of the first and second substances,

and

a first and a second scroll wheel speed measured by the encoder (all data between the encoder data closest to the current and the last image frame acquisition time), respectively. It will be appreciated that in obtaining the predictive transform matrix T₁₀In the process, all of the data in the first time period can be acquired

And

data, also can obtain parts

And

and (4) data.

Measuring the velocity v of the current image frame_EEAnd measuring angular velocity omega_EESubstituted into the differential equation (3) of the motion model:

where Ei represents the last image frame encoder coordinate system, ω_EEAn antisymmetric array representing the measured angular velocity of the current image frame,

and

respectively representing the current image frame rotation matrix and the first derivative of position with respect to time.

The differential formula of the motion model is integrated to obtain a predicted rotation matrix between two frames

And a predictive translation matrix

(the arrows indicate the estimation using the encoder information, assuming that the information does not contain any noise

). The process is a pre-integration process for obtaining a two frame prediction relationship.

Further, a rotation matrix is predicted

And a predictive translation matrix

Converting the coordinate system of the encoder into the coordinate system of the camera, so that a prediction transformation matrix between i and j frames can be obtained as shown in formula (4):

that is, T_CjCiTransforming the matrix T for the prediction described above₁₀. Wherein a fixed transformation matrix T of the camera coordinate system used in the conversion with respect to the encoder coordinate system_CECalibration is required first.

Further, by the formula T_1w＝T₁₀T_0wThe current image can be obtainedPredicted pose T of frame_1w. According to the embodiment of the invention, the encoder is applied to the visual SLAM, and the pose of the current image frame is predicted by the data measured by the encoder, so that the problems of visual failure such as blurring or no overlapping area caused by few (no texture) or too many repeated characteristic points and high-speed motion, accidental reduction of the frame rate of a camera and the like can be solved, map points and tracks before and after failure can be connected, an SLAM system cannot be interrupted, and the stability and robustness of the visual SLAM are improved.

The tracking thread S11 further includes optimizing the predicted pose, specifically:

optimizing the predicted pose of the current image frame by the predicted pose of the current image frame through a preset pure Motion beam adjustment (Motion-only BA: Motion-only bundle adjustment) to obtain the optimized pose of the current image frame; wherein the pure motion beam adjustment includes a reprojection error and an encoder error.

In the embodiment of the invention, the predicted pose of the current image frame is optimized through pure motion beam adjustment, wherein BA (bundle adjustment), i.e. adjusting camera parameters (mainly referring to extrinsic parameters: poses of a plurality of key frames; and possibly including intrinsic parameters: focal length and optical center pixel coordinates) and structure (position of map points seen by the camera), so that a reprojection error (i.e. an edge connecting two key frames corresponds to a stack of matching point pairs, wherein a point on the key frame 1 is projected onto the key frame 2 through a geometric relationship and then is different from a coordinate of the matching point) reaches a minimum (i.e. the error reaches a minimum as a quadratic sum of two norms of an information matrix as an optimized objective function), while Motion-only BA only adjusts camera parameters (mainly referring to the poses of many key frames), the optimized objective function is still a non-linear optimization method based on reprojection errors.

And the re-projection error re-projects the feature points on the prediction pose of the current image frame through the map points of the local map, and calculates the error between the re-projected feature points and the feature points of the current image frame.

The optimization formula of the objective function for pure motion beam adjustment can be as shown in formula (5):

wherein the content of the first and second substances,

for the maximum likelihood estimation of the pose, the part in the right min is an objective function, the first item is a priori error which is generally 0, but when the pose is closely coupled with the IMU, if the poses of two frames before and after the pose is optimized simultaneously, a Hessian matrix in the optimization process of Motion-only BA of the previous image frame can be used as an information matrix sigma₀ ^-1Keeping the pose of the previous image frame as unchanged as possible; the second term is visual observation z_ilA corresponding reprojection error term; the last item is encoder information E_ijThe formed error term (i.e., the square of the modulus of the error vector with respect to the covariance matrix).

In this embodiment, the encoder error term can be obtained by the following equation (6):

wherein e_RRepresenting the error of the relative rotation matrix (which is a three-dimensional vector using a logarithmic mapping or its lie algebra representation), ep representing the error of the relative translation (three-dimensional vector), i.e. the encoder error

Is a six-dimensional vector corresponding to a covariance matrix

Is a 6 x 6 square matrix, the inverse of which, the information matrix, is used here. e.g. of the type_R、e_pAnd covariance matrix

All according to the above, the rotation matrix is obtained by pre-integration

And relative translation

And (4) calculating.

The result of the pre-integration and the following formula, i.e. the relationship between the relative transformation in the encoder coordinate system and the relative transformation in the camera coordinate system, are used in calculating the encoder error term in the objective function:

namely:

wherein R is_CiWAnd p_CiWIs the pose R of the last image frame under the world coordinate system_CEAnd p_CENamely, the fixed transformation matrix T related in the formula (4) needs to be calibrated in advance_CEOf (2) is used.

The Motion-only BA described above uses non-linear optimization to calculate the optimal pose R of the current image frame that minimizes the objective function_CjWAnd p_CjW。

In addition, the Jacobian matrix of the Jacobian used in the nonlinear optimization process in this embodiment may be a numerical value, or an analytic solution derived from a lie algebra correlation formula may be used, and the latter may make the optimization more stable.

In another preferred embodiment, the pure motion beam adjustment further comprises an IMU error term, when pose optimization is also performed using an IMU (inertial measurement unit).

The optimization formula for the pure motion beam adjusted objective function can be as shown in equation 8:

wherein the content of the first and second substances,

is an IMU error term.

In the prior art, for obtaining the pose of the current image frame, a predictive violent matching mode is generally adopted, a predictive transformation matrix (comprising a rotation matrix and a translation vector) of a uniform motion model is adopted to project the feature points of the previous image frame to the current image frame, a small rectangular range is selected for carrying out rapid violent matching, the pose and the speed of the previous image frame are utilized to estimate the initial pose of the current image frame, and then the initial pose is optimized to obtain an optimized pose. The invention estimates the initial pose of the current image frame by adopting the wheel speed measured by the encoder, and then optimizes the initial pose to obtain the optimized pose, thereby having higher robustness.

Further, on the basis of the above embodiment, as shown in fig. 3, after the pure motion beam adjustment in step S201 is performed, the method may further include:

step S202, judging whether the vision is invalid or not;

if the vision is disabled, go to step S203; if the vision is not disabled, go to step S204;

step S203, taking the predicted pose of the current image frame as the pose of the current image frame;

and S204, taking the optimized pose of the current image frame as the pose of the current image frame.

In this embodiment, when pure motion beam adjustment fails, that is, when the optimal pose cannot be obtained through the objective function of the pure motion beam adjustment, it is determined that the vision is disabled. If the vision fails, the Motion-only BA process fails, namely the optimized pose obtained by final optimization is inaccurate and has larger deviation with the real pose, so that the predicted pose is not adopted as the pose of the current image frame; and the vision is not invalid, namely the Motion-only BA process is successful, and the optimized pose can be used as the pose of the current image frame.

In this embodiment, the visual failure may be caused by the following reasons: few feature points in the image (no texture) or too many repetitions with the previous image frame, blurring or no overlapping regions due to high-speed motion, occasional camera frame rate reduction, etc.

In a preferred embodiment, it is determined that the current image frame is visually disabled with respect to the previous image frame when the optimization result cannot be obtained by the pure motion beam adjustment model.

In another embodiment, when the visual failure is judged, the current image frame and the frame having the same feature point as the current image frame are subjected to feature point re-matching (or the map point in the map is subjected to feature point matching with the current image frame) through an optimized pose obtained by Motion-only BA, in the matching process, the same feature point in the frames as the current image frame can be projected into the current image frame, and if the distance between a certain feature point projected into the current image frame and the feature point corresponding to the current image frame is smaller than a threshold value, the certain feature point is determined to be an interior point; otherwise, the feature point is an outlier; judging whether the number of the internal points in the current image frame is greater than a preset number threshold, if so, indicating that the Motion-only BA is successful, namely the vision is not invalid; if the number is smaller than the preset number threshold, the Motion-only BA fails, namely, the visual failure is indicated.

In the present embodiment, in the tracking thread S11 of the visual SLAM, the pose information of the current image frame with respect to the previous image frame is acquired, that is, the motion trajectory of the movable platform can be tracked according to the relative pose.

In this embodiment, when the camera acquires an image of a current image frame, the pose of the current image frame camera may be predicted according to the pose of the previous image frame camera and the first rolling wheel speed and the second rolling wheel speed information detected by the encoder to obtain the predicted pose of the current image frame, and feature point matching is performed according to the predicted pose of the current image frame, the image of the current image frame, and a key frame having the same feature point as the current image frame to optimize the predicted pose of the current image frame to obtain the optimized pose of the current image frame.

When the vision is temporarily disabled due to the special problems of rare characteristic points (no texture) in the image, blurring or no overlapping area caused by too many repeated and high-speed motions with the previous image frame, accidental reduction of the camera frame rate and the like, that is, the pose of the current image frame cannot be optimized in a way of matching the characteristic points of the current image frame and the key frame with the same characteristic points as the current image frame, the predicted pose of the current image frame can be used as the pose information of the current image frame, so that the follow-up process can be ensured to be continued, and the problem that the work cannot be continued due to the fact that the map points and the tracks are not connected before and after the vision is disabled. Because the predicted pose of the current image frame is obtained based on the encoder, and the encoder does not have the characteristic of zero drift, the accumulated error in a short time can be ensured not to be very large, so that the transition problem from transient visual failure to visual recovery can be solved, and the stability and robustness of the visual SLAM are improved.

In this embodiment, after the pose information of the current image frame is acquired, the local mapping process S12 may be performed according to the image frame and the corresponding pose information. Specifically, it may be determined whether the current image frame is a key frame, and if the current image frame is determined to be a key frame, new map points may be added at the back end through matching of feature points of the key frame, so as to implement the construction of the local map and the optimization of the global map.

In the visual SLAM method provided by this embodiment, a camera image and encoder data are acquired; acquiring pose information of a camera according to the image frame and the measurement data of the encoder; and constructing a map according to the image frame and the pose information. By applying the encoder to the visual SLAM, the stability and robustness of the visual SLAM can be improved, and the situation that map points and tracks before and after visual failure cannot be connected to each other to cause incapability of continuous work when the visual failure is short can be avoided.

In the local mapping thread S12, as shown in fig. 4, after acquiring the pose of the current image frame, the method may further include:

step S301, judging whether the current image frame is a key frame;

if the current image frame is a key frame, the current image frame is added to the local map building thread S12 to build a map. Specifically, the method can comprise the following steps:

step S302, when the current image frame is determined to be a key frame, on the basis of a constructed map, updating the constructed map according to the current image frame;

wherein the constructed map is constructed from a keyframe determined prior to the current image frame.

In the embodiment, in order to reduce the consumption of computing resources in the map construction process, only key frames capable of providing enough map points are added into the map construction thread, and the map points with close distance can also be directly inserted into the map as new map points for the RGBD camera or the binocular camera.

In the prior art, it is usually determined whether the distance between a current image frame and a previous key frame is within a preset distance range, that is, the current image frame is used as a new key frame only when the distance between the current image frame and the previous key frame is not too far or too close; or when the matching number of the feature points of the current image frame is greater than a preset matching number threshold and enough new map points are provided, taking the current image frame as a new key frame, wherein the matching number of the feature points refers to the matching number of the feature points of the current image frame and map points in a map, or the matching number between the feature points of the current image frame and the feature points of the key frame with the same feature points as the current image frame; for the case that the monocular camera is tightly coupled with the IMU, it is further required that the time interval between the current image frame and the previous key frame is not too long in order to prevent the null shift, that is, when the time interval between the current image frame and the previous key frame exceeds the preset time interval threshold, the current image frame is certainly used as a new key frame.

In this embodiment, when the vision is not disabled, the key frame determination strategy in the prior art can be adopted; when the vision is invalid, because the predicted pose of the current image frame can be obtained through the encoder data, the two frames are allowed to be connected through the encoder (namely pure encoder edge pureneencoder edges are allowed), the distance between the current image frame and the last key frame is not required to be larger than the minimum value in a preset distance range, and the matching number of the feature points of the current image frame is not required to be larger than the threshold value of the preset matching number, so that the two limits can be removed; however, for the RGBD camera or the binocular camera, the minimum number limit of the map points created by the current image frame needs to be increased, that is, the number of the map points created by the current image frame needs to be greater than the preset map point number threshold, so as to prevent the contributing map points from losing the adding significance due to too few map points; for the case that the monocular camera is tightly coupled with the IMU, a minimum time interval limit is added to heuristically construct map points in the mapping thread (local mapping) through triangulation to recover visual tracking, that is, the time interval between the current image frame and the last key frame needs to be greater than a preset minimum time interval, and the redundant or unimportant key frames and map points can be removed in the mapping thread (local mapping), so that the problem that the time criterion adopted by the monocular camera causes redundancy is not worried.

Further, after the vision is recovered, the key frame judgment strategy in the case of the visual failure can be recovered to the key frame judgment strategy in the prior art.

Further, as shown in fig. 5, updating the constructed map according to the current image frame includes:

s401, taking out redundant key frames and redundant map points according to a preset strategy, and carrying out triangularization operation according to the current image frame to generate new map points;

step S402, after generating a new map point, constructing a common view of the current image frame to perform local beam adjustment so as to optimize other key frames which are in common view with the current image frame, the map point which can be seen by the current image frame and the map point which can be seen by the other key frames; wherein the local beam adjustment includes an encoder error term.

In this embodiment, after acquiring the new key frame, redundant map points may be removed first, where the redundant map points may include map points that are not near the current key frame and are seen too few times and map points with a smaller ratio of consecutive seen times in the previous frames, and feature points in the current image frame that can be fused with existing map points may be replaced by fusion.

After local BA optimization key frame pose and map points are performed, redundant key frames may also be removed, where a redundant key frame refers to a situation where there are at least 1 other key frame and the number of the key frames that see the same map points exceeds a threshold (e.g., 90%). By removing the redundant map points and the redundant key frames, the map points in the map can be prevented from being too many, the calculated amount is reduced, and the map building efficiency is improved.

Furthermore, as shown in fig. 6, the constructing the common view of the current image frame performs local beam adjustment, including:

step S501, when any key frame is connected with the previous key frame by a pure encoder side and the key frame is connected with the next key frame by the pure encoder side, removing the key frame and obtaining a second time period according to the timestamp of the previous key frame and the timestamp of the next key frame; wherein the pure encoder edge is the occurrence of the visual failure between adjacent key frames;

step S502, acquiring all the measurement data of the first encoder and all the measurement data measured by the second encoder in the second time period;

step S503, obtaining a prediction transformation matrix of the next key frame relative to the previous key frame according to all the measurement data measured by the first encoder and all the measurement data of the second encoder in the second time period.

Specifically, after a new key frame is acquired, consecutive key frames connected by pure encoder edges can be removed, wherein the consecutive key frames connected by pure encoder edges are "key frame 1-encoder edge 1-key frame 2-encoder edge 2-key frame 3", the relative pose of key frame 2 with respect to key frame 1 is a predicted pose obtained by encoder data (i.e., encoder edge 1), and the relative pose of key frame 3 with respect to key frame 2 is a predicted pose obtained by encoder data (i.e., encoder edge 2). removing the consecutive pure encoder edges can reduce the amount of calculation for subsequent optimization and prevent large errors caused by triangularization using only encoder data (wherein the encoder has lower accuracy than vision due to the accuracy of the encoder and friction with the ground). Removing the continuous key frames connected by the pure encoder edge, and for the three key frames in the above example, deleting the continuous key frames 2 connected by the pure encoder edge to obtain "key frame 1-encoder edge-key frame 3", where the encoder edge needs to be recalculated, that is, the predicted pose of the key frame 3 with respect to the key frame 1 is calculated through the pre-integration process in the above embodiment.

In this embodiment, a new map point can be created by triangularization, which is to find the position of the ray intersection point relative to two points (the depth or position of a matching point pair or a 3D point) from the two points (the relative pose of the keyframe) and the ray passing through them (the pixel coordinates, i.e., direction, of the matching point pair).

Specifically, for a binocular camera, for a certain fixed 3D point X existing in the space, the 3D point X is observed by a left camera and a right camera of the binocular camera, the 3D point is X at a left camera imaging position, and is X 'at a right camera imaging position, according to a pinhole camera model and a camera projection equation, we can respectively calculate two rays according to X and X', and the two rays can pass through the 3D point X on the ground. Based on this theory, the intersection of the two rays is the position of the spatial 3D point X, and the position of the 3D point X can be determined. For a monocular camera, images acquired by a left camera and a right camera of a binocular camera can be replaced by two adjacent frames (relative poses between two known frames), the two adjacent frames observe a certain fixed 3D point X in space at the same time, the three-angle process of the two adjacent frames is the same as that of the two adjacent frames of the binocular camera, and details are not repeated here. For the RGBD camera, since the coordinates of the feature points relative to the false right camera have been calculated in the preprocessing process, and the image acquired by the RGBD camera is equivalent to the image acquired by the binocular camera, the triangulation process same as that of the binocular camera can be adopted, and details are not repeated here.

In this embodiment, the keyframe pose and map points may be optimized via a local BA, where the local BA process is similar to the Motion-only BA optimization process in the above-described embodiment, and the objective function of its local BA also adds the encoder error term. In the local BA in the prior art, the keyframes participating in the local BA are usually common views of the current keyframe (that is, the map includes two types of vertices of frames and map points, and reprojection error edges between the frames and the map points that can be seen by the frames, the keyframes can see the map points that can be seen by the current keyframe, and the map points in the map are all the map points that can be seen by the keyframes), whereas in the present embodiment, the keyframe participating in the local BA is changed into N consecutive keyframes (N is greater than or equal to 1) forward from the current image frame, and an objective function of the local BA can be constructed by the N consecutive keyframes, so that the pose and the map points of the current keyframe are optimized, and the computation amount can be reduced.

Further, after obtaining the local map, the current key frame is sent to the loop detection thread S13 for performing loop detection to determine whether the movable platform is located at the position that has been reached.

In the prior art, closed loop detection, a map database is queried, similarity transformation (SE3 or Sim3) between a current key frame and a closed loop key frame is calculated by a RANSAC (Random Sample Consensus) algorithm, and map points seen by key frames near the current key frame and the closed loop key frame are fused by closed loop fusion. The RANSAC algorithm randomly adopts several pairs of matching points to calculate transformation, uses the theory of probability statistics to find the most correct transformation, and distinguishes which matching point pairs are correct (the matching point pairs are correct, and are called interior points).

In this embodiment, a Bag-of-words (BoW) vector of each key frame may be obtained, a distance (e.g., an euclidean distance) between the Bag-of-words vector of any key frame and the Bag-of-words vector of the current key frame may be obtained, and whether a closed loop exists may be determined according to the distance between the two Bag-of-words vectors, where the Bag-of-words vector is obtained according to feature points of the key frame and a preset feature point Bag, where the preset feature point Bag is constructed by feature clustering and includes feature point information arranged in a predetermined order, and the feature points of the key frame are matched with each feature point information in the preset feature point Bag, and if a certain feature point in the preset feature point Bag appears in the key frame, the Bag-of-words vector of the key frame is set to 1, and if the certain feature point in the preset feature point Bag does not appear in the key frame, the Bag-of-words vector of the. The bag-of-word vector of each key frame can be stored in the database, whether the current key frame is closed-loop or not can be judged by judging the distance between the bag-of-word vector of the current key frame and the bag-of-word vector of each key frame in the database, wherein the matched key frame is the closed-loop key frame of the current image frame.

Further, in this embodiment, the threshold for detecting the closed loop may be reduced after the visual failure, where the threshold for detecting the closed loop includes the threshold for repeating the detection times, calculating the number of matching points and the number of internal points of the similarity transformation, and by relaxing the condition for detecting the closed loop, the closed loop may be detected quickly after the visual failure, so as to reduce the error generated by the pure encoder side in the visual failure process. After the vision is recovered, the threshold value of the detection transformation can be recovered to be a normal threshold value, and the excessive and frequent closed-loop consumption calculation amount can be avoided.

Further, when a closed loop is detected, pose graph optimization can be performed.

The pose graph is a graph with vertexes not containing map points, only containing all key frames, and edges only containing relative transformation between frames on a support tree and similar transformation corresponding to all closed-loop edges, wherein the support tree (spanning tree) is a loop-free graph connecting the key frames in the whole map, two continuous key frames in general time can become parent and child nodes, but when a certain parent node is deleted, the parent node of the child node is modified according to the number of map points which are commonly seen (at this time, the pre-integral of the child node is updated according to the operation of correspondingly removing the key frames), and the root node is the initial key frame when the visual SLAM system builds the graph.

In the embodiment, the pose graph is optimized by the pose of each key frame, and the objective function is relative transformation

Error of pose of positive vertex

About protocolVariance matrix

The sum of squares (originally 3 × 3 identity matrix I).

In this embodiment, when a visual failure occurs between adjacent key frames, a corrected covariance matrix is generated according to the prediction transformation matrix to modify a covariance matrix of relative transformation between the adjacent key frames. In particular, covariance matrices obtained by pre-integration

Pure encoder edge corresponding relative transformation on modified support tree

Covariance matrix of

Making its diagonal elements generally larger than 1, i.e., indicating that the pure encoder edge is less accurate than the encoder-visually tightly coupled edge, allows the non-linear optimization to attenuate more the error introduced by the pure encoder edge. By the above pose graph optimization, errors introduced by pure encoder edges can be dispersed to all edges, for example, on a support tree, a key frame 1, a key frame 2 and a key frame 3, when a key frame 3 and a key 1 are closed, a closed-loop edge is formed between the key frame 1 and the key frame 3 (a similarity transformation SE3 or Sim3 between the current key frame and the closed-loop key frame), and accumulated errors of the key frame 1 and the key frame 3 can be dispersed to all edges of the key frame 1, the key frame 2 and the key frame 3, so that the errors of each edge are relatively small and within an acceptable range.

Furthermore, after the optimization of the attitude map is completed, global BA optimization can be carried out. The global BA is similar to the Motion-only BA and local BA optimization process in the above embodiment, and the target function of the global BA also adds an encoder error term, but is different from the local BA optimization process in that the key frames participating in the global BA are all key frames and map points in the entire map library. After the global BA is completed, the map can be updated, all map points are updated, and all key frames in the map library are updated until the latest key frame.

On the basis of any of the above embodiments, if the visual SLAM is tightly coupled to the IMU, the IMU needs to be initialized to obtain a zero offset (bias) of the IMU, including a zero offset of a gyroscope and a zero offset of an accelerometer in the IMU, and a gravity vector can also be obtained, and for a case where the monocular camera is tightly coupled to the IMU, a map scale needs to be determined through initialization of the IMU, while the binocular camera and the RGBD camera do not generally need to determine the map scale through initialization of the IMU, and of course, when the map scale is determined for the binocular camera and the RGBD camera through initialization of the IMU, an initialization result is close to 1.

In this embodiment, a predetermined number of key frames (for example, 20) at the beginning of the operation of the visual SLAM may be taken, the zero offset, the gravity vector, and the map scale of the IMU may be acquired according to the key frames, then the map scale may be updated to all the key frames, including the predetermined number of key frames and the new key frames obtained in the initialization process, then the global BA optimization is performed to update the map scale, and then the visual SLAM may operate at the updated map scale.

On the basis of any one of the above embodiments, the present embodiment further provides a lightweight positioning manner.

Specifically, in this embodiment, a simplest storage manner may be adopted after the map construction is completed, that is, only the necessary key frames and map point information of the map are stored, and binary files may be used for storage. The primary storage relates to the sensor type; the sequence number, the pose, the bag-of-words vector, the characteristic point sequence and the inter-frame sensor information sequence of the key frame, the sequence number information of map points which can be observed by the frame, and the sequence number information of a closed-loop key frame connected with the frame; the number and position of map points, the number of reference key frames, and the number information of key frames which can be observed by map points; support tree structure (i.e. father node sequence number)

During reading, reverse storage operation is adopted, the key frames and the map points are reconstructed, relevant information (such as operations of supporting a tree structure, sharing view information, adding the key frames into a map database and the like) is updated, the sequence numbers of the key frames and the sequence numbers of the map points can be reconstructed when the key frames and the map points are constructed, and the original sequence numbers are not adopted.

In the embodiment, the map information of the visual SLAM system which runs for a long time can be stored in a small storage space, the maximum working time of the visual SLAM system can be prolonged, and more time for reconstructing the map and the track is needed when the visual SLAM system is reloaded. When the reconstructed map is adopted, the visual SLAM can only start the tracking thread S11 for positioning, and does not need to start the local mapping thread S12 and the closed-loop thread, so that lightweight positioning is realized. It should be noted that, for the case where the monocular camera is tightly coupled to the IMU, the IMU initialization (acquiring the initial zero offset and gravitational acceleration) is still required.

The embodiment of the invention provides a movable platform. Fig. 7 is a structural diagram of a movable platform according to an embodiment of the present invention, and as shown in fig. 7, the movable platform 700 includes a first scroll wheel 701, a second scroll wheel 702, a first encoder 703, a second encoder 704, a camera 705, a memory 707, and a processor 706;

the first rolling wheel 701 and the second rolling wheel 702 are used for enabling the movable platform to displace, and the axes of the first rolling wheel 701 and the second rolling wheel 702 are coincident;

the camera 705 is used for collecting image frames;

said first encoder 703 is for detecting the velocity of said first scroll wheel 701 and said second encoder 704 is for detecting the velocity of said second scroll wheel 702;

the memory 707 is used to store program codes;

the processor 706, invoking the program code, when executed, is configured to:

acquiring all measurement data of the first encoder 703 and all measurement data measured by the second encoder 704 in the first time period;

obtaining a prediction transformation matrix of the current image frame relative to the previous image frame according to all the measurement data measured by the first encoder 703 and all the measurement data of the second encoder 704 in the first time period;

On the basis of the above embodiment, when the processor 706 obtains the prediction transformation matrix of the current image frame with respect to the previous image frame according to the measurement data of the first encoder and the measurement data of the second encoder in the first time period, the processor is configured to:

acquiring a first derivative of a rotation matrix of the current image frame with respect to time and a second derivative of the position of the current image frame with respect to time according to the measurement data of the first encoder and the measurement data of the second encoder in the first time period;

respectively integrating the first derivative and the second derivative based on the first time period to obtain a predicted rotation matrix and a predicted translation matrix of the current image frame relative to the previous image frame;

and obtaining a prediction transformation matrix of the current image frame relative to the previous image frame according to the prediction rotation matrix and the prediction translation matrix of the current image frame relative to the previous image frame.

On the basis of any of the above embodiments, after the processor 706 obtains the predicted pose of the current image frame according to the predicted change matrix and the pose information of the previous image frame, the processor 706 is further configured to:

optimizing the predicted pose of the current image frame according to a preset pure motion beam adjustment model; wherein the pure motion beam adjustment model comprises an encoder error term derived from the measurement data of the first encoder 703 and the measurement data of the second encoder 704.

On the basis of any of the above embodiments, the processor 706 is further configured to:

if the current image frame is in visual failure relative to the last image frame, the pose of the current image frame is the predicted pose;

and if the current image frame does not have visual failure relative to the last image frame, the pose of the current image frame is the optimization result of the pure motion beam adjustment model.

when the optimization result cannot be obtained through the pure motion beam adjustment model, determining whether the current image frame is in visual failure relative to the previous image frame; or the like, or, alternatively,

according to the optimized pose, performing feature matching on the current image frame and the previous image frame to obtain the number of internal points;

and when the number of the inner points is smaller than a preset threshold value, determining whether the current image frame is in visual failure relative to the last image frame.

when the current image frame is determined to be a key frame, updating the constructed map according to the current image frame on the basis of the constructed map; the constructed map is constructed from a keyframe determined prior to the current image frame.

On the basis of any of the above embodiments, when the processor 706 updates the constructed map according to the current image frame, the processor 706 is configured to:

taking out redundant key frames and redundant map points according to a preset strategy, and carrying out triangulation operation according to the current image frame to generate new map points;

after generating a new map point, constructing a common view of the current image frame to perform local beam adjustment so as to optimize other key frames which are in common view with the current image frame, the map point which can be seen by the current image frame and the map point which can be seen by the other key frames; wherein the local beam adjustment includes an encoder error term.

On the basis of any of the above embodiments, when the processor 706 constructs a co-view of the current image frame for local beam adjustment, the processor 706 is configured to:

when any key frame is connected with the previous key frame by a pure encoder and the key frame is connected with the next key frame by the pure encoder, removing the key frame and acquiring a second time period according to the timestamp of the previous key frame and the timestamp of the next key frame; wherein the pure encoder edge is the occurrence of the visual failure between adjacent key frames;

acquiring all the measurement data of the first encoder 703 and all the measurement data measured by the second encoder 704 in the second time period;

obtaining a prediction transformation matrix of the subsequent key frame relative to the previous key frame according to all the measurement data measured by the first encoder 703 and all the measurement data of the second encoder 704 in the second time period.

On the basis of any of the above embodiments, when the processor 706 constructs a co-view of the current image frame for local beam adjustment, the processor 706 is further configured to:

acquiring N continuous key frames preset before the current image frame, and constructing a local optimization map according to the current image frame and map points which can be seen by the current image frame, the N continuous key frames and the map points which can be seen by the N continuous key frames; n is greater than or equal to 1;

constructing the encoder error item according to a prediction transformation matrix between any two adjacent key frames in the local optimization map, and performing reprojection according to the poses of the map points on each key frame to acquire reprojection errors;

and optimizing the current image frame and the map points which can be seen by the current image frame, the N continuous key frames and the map points which can be seen by the continuous key frames according to the encoder error and the reprojection error, and updating the poses of all key frames and the positions of all map points in the local optimization map.

performing closed loop detection according to the characteristic points of the current image frame;

and when the current image frame and any key frame are detected to be closed, fusing map points seen by the current image frame and the key frame near the key frame.

On the basis of any of the above embodiments, after the processor 706 performs the closed loop detection, the processor 706 is further configured to:

constructing a pose graph to optimize all key frames so as to update the poses of all key frames; the pose graph includes all key frames and the relative transformation between two key frames.

On the basis of any of the above embodiments, when the processor 706 constructs the pose graph to optimize all the keyframes, the processor 706 is configured to:

and when visual failure occurs between the adjacent key frames, generating a corrected covariance matrix according to the prediction transformation matrix to modify a covariance matrix of relative transformation between the adjacent key frames.

On the basis of any of the above embodiments, after the processor 706 constructs the pose graph to optimize all the keyframes, the processor 706 is further configured to:

and updating all key frames and all map points according to the preset global beam adjustment.

In another embodiment, the processor 706, invoking the program code, when executed, is configured to:

The specific principle and implementation of the movable platform provided by the embodiment of the present invention are similar to those of the above embodiments, and are not described herein again.

The movable platform provided by the embodiment acquires the first time period according to the timestamp of the current image frame and the timestamp of the last image frame; acquiring measurement data of a first encoder and measurement data of a second encoder in the first time period; obtaining a prediction transformation matrix of the current image frame relative to the previous image frame according to the measurement data of the first encoder and the measurement data of the second encoder in the first time period; and obtaining the predicted pose of the current image frame according to the predicted change matrix and the pose information of the previous image frame. In the embodiment, the encoder is applied to the visual SLAM, and the pose of the current image frame is predicted by the encoder measuring data, so that the pose acquisition accuracy can be improved, the stability and the robustness of the visual SLAM are improved, and the visual SLAM can still stably run when the vision fails.

In addition, the present embodiment also provides a computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the pose prediction method and/or the map construction method described in the above embodiments.

In addition, the present embodiment also provides a computer program including a program code that executes the pose prediction method and/or the map construction method according to the above embodiment when the computer program is executed by a computer.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. The pose prediction method is applicable to a movable platform, the movable platform comprises a first rolling wheel, a second rolling wheel, a camera, a first encoder and a second encoder, the first rolling wheel and the second rolling wheel are used for enabling the movable platform to be displaced, and the axes of the first rolling wheel and the second rolling wheel are coincident; the camera is used for collecting image frames, the first encoder is used for detecting the speed of the first rolling wheel, and the second encoder is used for detecting the speed of the second rolling wheel; the method comprises the following steps:

2. The method of claim 1, wherein obtaining a predictive transform matrix for a current image frame relative to a previous image frame based on measurement data of a first encoder and measurement data of a second encoder during the first time period comprises:

3. The method according to claim 1 or 2, wherein after obtaining the predicted pose of the current image frame according to the predicted change matrix and pose information of the previous image frame, the method further comprises:

optimizing the predicted pose of the current image frame according to a preset pure motion beam adjustment model; wherein the pure motion beam adjustment model comprises an encoder error term calculated based on measurement data of the first encoder and measurement data of the second encoder.

4. The method of claim 3, further comprising:

5. The method of claim 4, further comprising:

when the optimization result cannot be obtained through the pure motion beam adjustment model, determining that the current image frame is in visual failure relative to the previous image frame; or the like, or, alternatively,

and when the number of the inner points is smaller than a preset threshold value, determining that the current image frame has visual failure relative to the last image frame.

6. The method of any one of claims 1-5, further comprising:

7. The method of claim 6, wherein said updating the constructed map from the current image frame comprises:

8. The method of claim 7, wherein said constructing a co-view of the current image frame performs local beam adjustment, comprising:

when any key frame is connected with the previous key frame by a pure encoder side and the key frame is connected with the next key frame by the pure encoder side, removing the key frame and acquiring a second time period according to the timestamp of the previous key frame and the timestamp of the next key frame; wherein the pure encoder edge is the occurrence of the visual failure between adjacent key frames;

acquiring all the measurement data of the first encoder and all the measurement data measured by the second encoder in the second time period;

and obtaining a prediction transformation matrix of the subsequent key frame relative to the previous key frame according to all the measurement data measured by the first encoder and all the measurement data measured by the second encoder in the second time period.

9. The method of claim 7 or 8, wherein said constructing a co-view of said current image frame performs local beam adjustment, further comprising:

10. The method according to any one of claims 1-9, further comprising:

11. The method of claim 10, further comprising, after performing closed loop detection:

12. The method of claim 11, wherein the constructing a pose graph to optimize all key frames comprises:

13. The method according to claim 11 or 12, wherein after the constructing the pose graph to optimize all key frames, further comprising:

14. A movable platform, comprising: the device comprises a first rolling wheel, a second rolling wheel, a camera, a first encoder, a second encoder, a memory and a processor;

the camera is used for collecting image frames;

the memory is used for storing program codes;

the processor, invoking the program code, when executed, is configured to:

15. The movable platform of claim 14, wherein when the processor obtains the predictive transform matrix for a current image frame relative to a previous image frame based on measurement data of the first encoder and measurement data of the second encoder during the first time period, the processor is configured to:

16. The movable platform of claim 14 or 15, wherein after the processor obtains the predicted pose for the current image frame from the predicted change matrix and pose information for a previous image frame, the processor is further configured to:

17. The movable platform of claim 16, wherein the processor is further configured to:

18. The movable platform of claim 17, wherein the processor is further configured to:

19. The movable platform of any one of claims 14-18, wherein the processor is further configured to:

20. The movable platform of claim 19, wherein, when the processor updates the constructed map from the current image frame, the processor is configured to:

21. The movable platform of claim 20, wherein, when the processor constructs a co-view of the current image frame for local beam adjustment, the processor is configured to:

22. The movable platform of claim 20 or 21, wherein, when the processor constructs a co-view of the current image frame for local beam adjustment, the processor is further configured to:

23. The movable platform of any one of claims 14-22, wherein the processor is further configured to:

24. The movable platform of claim 23, wherein after the processor performs closed loop detection, the processor is further configured to:

25. The movable platform of claim 24, wherein, when the processor constructs a pose graph to optimize all keyframes, the processor is configured to:

26. The movable platform of claim 24 or 25, wherein after the processor constructs a pose graph to optimize all keyframes, the processor is further configured to:

27. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to perform the method of any one of claims 1-13.

28. A map construction method is characterized by being applicable to a movable platform, wherein the movable platform comprises a first rolling wheel, a second rolling wheel, a camera, a first encoder and a second encoder, the first rolling wheel and the second rolling wheel are used for enabling the movable platform to displace, and the axes of the first rolling wheel and the second rolling wheel are coincident; the camera is used for collecting image frames, the first encoder is used for detecting the speed of the first rolling wheel, and the second encoder is used for detecting the speed of the second rolling wheel; the method comprises the following steps:

29. The method of claim 28, wherein said updating the constructed map from the current image frame based on obtaining the constructed map comprises:

when the current image frame is determined to be the key frame, updating the constructed map according to the current image frame on the basis of the constructed map.

30. The method of claim 28 or 29, wherein obtaining the predictive transform matrix of the current image frame relative to the previous image frame based on the measurement data of the first encoder and the measurement data of the second encoder during the first time period comprises:

31. The method of any one of claims 28-30, further comprising, after obtaining the predicted pose for the current image frame:

32. The method of claim 31, after obtaining the predicted pose for the current image frame, further comprising:

33. The method of claim 32, after obtaining the predicted pose for the current image frame, further comprising:

34. The method of any of claims 28-33, wherein said updating said constructed map from said current image frame comprises:

35. The method of claim 34, wherein said constructing a co-view of said current image frame performs local beam adjustment comprising:

36. The method of claim 34 or 35, wherein said constructing a co-view of said current image frame performs local beam adjustment, further comprising:

37. The method of any one of claims 28-36, further comprising, after obtaining the predicted pose for the current image frame:

38. The method of claim 37, further comprising, after performing closed loop detection:

39. The method of claim 38, wherein the constructing a pose graph to optimize all key frames comprises:

40. The method according to claim 38 or 39, further comprising, after the constructing a pose graph to optimize all key frames:

41. A movable platform, comprising: the device comprises a first rolling wheel, a second rolling wheel, a camera, a first encoder, a second encoder, a memory and a processor;

the camera is used for collecting image frames;

the memory is used for storing program codes;

the processor, invoking the program code, when executed, is configured to:

42. The movable platform of claim 41, wherein, when the processor, upon obtaining an update to the constructed map from the current image frame based on the constructed map, the processor is configured to:

43. The movable platform of claim 41 or 42, wherein when the processor obtains the prediction transform matrix for a current image frame relative to a previous image frame from the measurement data of the first encoder and the measurement data of the second encoder during the first time period, the processor is configured to:

44. The movable platform of any one of claims 41-43, wherein after the processor obtains the predicted pose for the current image frame, the processor is further configured to:

45. The movable platform of claim 44, wherein after the processor obtains the predicted pose for the current image frame, the processor is further configured to:

46. The movable platform of claim 45, wherein after the processor obtains the predicted pose for the current image frame, the processor is further configured to:

47. The movable platform of any one of claims 41-46, wherein, when the processor updates the constructed map from the current image frame, the processor is configured to:

48. The movable platform of claim 47, wherein, when the processor constructs a co-view of the current image frame for local beam adjustment, the processor is configured to:

49. The movable platform of claim 47 or 48, wherein, in constructing a co-view of the current image frame for local beam adjustment by a processor, the processor is further configured to:

50. The movable platform of any one of claims 41-49, wherein after the processor obtains the predicted pose for the current image frame, the processor is further configured to:

51. The movable platform of claim 50, wherein after the processor performs closed loop detection, the processor is further configured to:

52. The movable platform of claim 51, wherein, when the processor constructs a pose graph to optimize all keyframes, the processor is configured to:

53. The movable platform of claim 51 or 52, wherein after the processor constructs a pose graph to optimize all keyframes, the processor is further configured to:

54. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to perform the method of any one of claims 28-40.