CN113587916B

CN113587916B - Real-time sparse vision odometer, navigation method and system

Info

Publication number: CN113587916B
Application number: CN202110849798.7A
Authority: CN
Inventors: 刘宁; 节笑晗; 苏中; 李擎; 赵辉; 赵旭; 刘福朝
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2023-10-03
Anticipated expiration: 2041-07-27
Also published as: CN113587916A

Abstract

Provided are a real-time sparse visual odometer, a navigation method and a system. Wherein, this real-time sparse vision odometer includes: the visual front end is configured to extract key frames and feature points based on the inter-frame images read by the camera and the inter-frame IMU read by the inertial sensor, add corresponding new data to the map based on the key frames and the feature points, and update the map; a visual backend configured to, upon detecting that a map is updated, remove incoherent data in the map to preserve the scale of the sparse map. The application solves the technical problem of inaccurate navigation of the odometer caused by inaccurate association of the frame and the image between frames.

Description

Real-time sparse vision odometer, navigation method and system

Technical Field

The application relates to the field of navigation, in particular to a real-time sparse visual odometer, a navigation method and a system.

Background

A technique based on synchronous localization and mapping (SLAM, simultaneous Localization and Mapping) is applied to navigation fields of a plurality of robots and the like. Abundant environment information and map information can be provided through real-time positioning and mapping, and the knowledge and perception of unmanned vehicles, robots and the like on the environment are facilitated. The availability of the robot working space map comprises real-time positioning, motion planning, collision avoidance and other tasks, and is the most important requirement for the robot to autonomously execute tasks and complete target tasks. However, the existing SLAM may have a false match of the inter-frame image to some extent and a shortage in rotation estimation, and the conventional inertial navigation may have a navigation error and an error of the inertial device, and the errors may accumulate with time. In addition, the calculation amount is increased to improve the accuracy, so that the real-time performance cannot be well ensured.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a real-time sparse visual odometer, a navigation method and a system, which are used for at least solving the technical problem that the odometer is inaccurate in navigation due to the fact that frame-to-frame images cannot be accurately associated.

According to an aspect of an embodiment of the present application, there is provided a real-time sparse visual odometer comprising: the visual front end is configured to extract key frames and feature points based on the inter-frame images read by the camera and the inter-frame IMU read by the inertial sensor, add corresponding new data to the map based on the key frames and the feature points, and update the map; a visual backend configured to, upon detecting that a map is updated, remove incoherent data in the map to preserve the scale of the sparse map.

According to another aspect of the embodiment of the present application, there is also provided a navigation method based on a real-time sparse visual odometer, including: extracting key frames and feature points based on the inter-frame images read by the camera and the inter-frame IMU read by the inertial sensor, adding corresponding new data into a map based on the key frames and the feature points, and updating the map; in the event that an update of the map is detected, irrelevant data in the map is removed to preserve the scale of the sparse map; navigation is performed based on the map.

According to another aspect of the embodiment of the present application, there is also provided a navigation system based on a real-time sparse visual odometer, including: an inertial sensor configured to read an interframe IMU; a binocular camera configured to read the inter-frame images; as described for the real-time sparse visual odometer.

In the embodiment of the application, the visual front end is configured to extract key frames and characteristic points based on the inter-frame images read by the camera and the inter-frame IMU read by the inertial sensor, and add corresponding new data to the map based on the key frames and the characteristic points and update the map; the visual back end is configured to remove incoherent data in the map under the condition that the map is detected to be updated so as to keep the scale of the sparse map, thereby realizing the technical effect of accurate navigation and further solving the technical problem of inaccurate navigation of the odometer caused by inaccurate association of the images between frames.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic structural diagram of a real-time sparse visual odometer in accordance with an embodiment of the application;

FIG. 2 is a schematic structural diagram of another real-time sparse visual odometer in accordance with an embodiment of the application;

FIG. 3 is a flow chart of a method of implementing a real-time sparse visual odometer in accordance with an embodiment of the application;

fig. 4 is a schematic structural diagram of a navigation system based on a real-time sparse visual odometer according to an embodiment of the application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to an embodiment of the present application, there is provided a real-time sparse visual odometer, as shown in fig. 1, including:

a vision front-end 10 configured to extract key frames and feature points based on the inter-frame images read by the camera and the inter-frame IMU read by the inertial sensor, and to add corresponding new data to the map based on the key frames and the feature points, and to update the map;

the visual backend 12 is configured to, in the event that a map is detected to be updated, remove incoherent data in the map to preserve the scale of the sparse map.

In one exemplary embodiment, the vision front-end 10 includes: an optical flow calculation module configured to calculate a pose of an inter-frame image based on optical flow tracking results of the inter-frame image and a previous frame image read by a camera; and the pose fusion and estimation module is configured to perform pose estimation of the inertial sensor based on the frame-to-frame state read by the inertial sensor, and fuse the pose of the inter-frame image with the pose estimation of the inertial sensor so as to extract the key frame.

In one exemplary embodiment, the vision front-end 10 further comprises: a key frame extraction module configured to extract the key frame based on the pose of the interframe image and the pose estimation of the inertial sensor after fusion; and a feature extraction module configured to extract the feature points based on the extracted key frames. The feature points are mainly points which are designed manually and have the following features: 1. repeatability (the same features can be found in different images anymore); 2. distinguishability (different features have different expressions); 3. efficiency (the number of feature points in the same image should be much smaller than the number of pixels); 4. locality (features are only related to a small image area). And the feature points consist of key points and descriptors.

In an exemplary embodiment, the feature extraction module is further configured to: selecting a pixel P on the inter-frame image, and obtaining the brightness Ip of the pixel P; selecting a plurality of pixels on a circle with a radius of a preset value by taking the pixel P as a center; if the circle has continuous N points with brightness larger than ip+T or smaller than Ip-T, judging the pixel P as the characteristic point, wherein T is a preset brightness threshold; repeating the steps until all pixels on the inter-frame image are processed by the steps.

In an exemplary embodiment, the feature extraction module further includes deleting pixels on the inter image that are not corner points before selecting a pixel P on the inter image.

In an exemplary embodiment, the feature extraction module is further configured to: in the small image block B of the inter image, the moment of the image block is defined as:

the centroid is determined based on the following formula:

in the small image block, a geometric center point O is connected with a centroid C, and the feature point is determined based on the following formula:

wherein m is _pq For the moment of the image block, B is the selected image block, I (x, y) is the gray scale of the pixels of the image block at the coordinate points (x, y), p is the first pixel selected on the image block, q is the second pixel selected on the image block, m ₁₀ For the first moment, m, of the third pixel selected on said image block ₀₁ For the first moment, m, of the fourth pixel selected on said image block ₀₀ And theta is the direction of the feature point for the zero-order moment on the image block.

In an exemplary embodiment, the vision front-end 10 further includes a new feature adding module configured to supplement new feature points for binocular left and right camera matching and triangulating, and adding new pose information to the map when the feature point extraction effect does not reach the desired level.

In one exemplary embodiment, the visual back end 12 includes: a check update module configured to perform a check update on the key frame and the feature point after the key frame and the feature point are acquired from the vision front end; the optimizing module is configured to optimize the key frames and the feature points after the inspection and updating, and returns an optimizing result; a scale control module configured to control the scale of the optimization within a predetermined range such that the scale does not increase over time.

Through the embodiment, the method has the following beneficial effects: 1. accurate association of visual odometer frames with inter-frame images; 2. the combination of inertial navigation and visual SLAM is utilized to improve the system precision and the accuracy of rotation estimation; 3. and the back end performs loop optimization to restrain the accumulated error of inertial navigation.

Example 2

According to an embodiment of the present application, there is provided a real-time sparse visual odometer based on zynq (scalable processing platform) binocular ORB (Oriented FAST and Rotated BRIEF, an algorithm for fast feature point extraction and description), the real-time sparse visual odometer including 3 parts as shown in fig. 2, respectively: inertial navigation and vision front end 22, back end 24, and final mapping representation 26.

Inertial navigation processes data in separate threads with the vision front-end 22 and back-end 24. The inertial navigation and vision front end 22 extracts key frames from pictures taken by the camera and then adds new data to the map.

When the backend 24 detects a map update, an optimization is run that removes old key frames and map points from the map to preserve the scale of the sparse map.

Example 3

Referring to fig. 3, a flowchart of a method for implementing the real-time sparse vision odometer according to an embodiment of the application is shown in fig. 3, and includes the following steps:

step S302, the camera and IMU (inertial sensor) are initialized.

A camera (also called a webcam) reads the inter-frame images and an IMU reads the inter-frame IMU.

Step S304, front-end processing.

And inserting an image frame read by a camera, extracting image features, carrying out optical flow tracking with the previous frame, and calculating the pose of the frame through an optical flow result.

And simultaneously, extracting the inter-frame state of the IMU to perform gesture calculation, then fusing the image with the inertial navigation pose estimation, and updating the pose information and the map according to the fused pose.

When the effect is not ideal, new feature points can be supplemented to match the left camera with the right camera, triangularization is performed, new pose information is added into the map, and loop optimization of the rear end is triggered.

Step S306, back-end processing.

The result of the front-end processing is used as an initial value of the back-end optimization, and the back-end acquires the processed key frames and the post-processing points, checks, updates and optimizes the key frames and the post-processing points, and returns an optimization result. The scale of the control optimization problem is within a certain range, the control calculation amount cannot be increased along with time, and then the map is added by updating.

If the tracking loss phenomenon occurs, the camera and the IMU are immediately reset for initialization, and the process is repeated.

There are various methods for extracting feature points. For example, the ORB algorithm, i.e., the fast feature point extraction and description, may be used in the portion where the feature points are extracted. The FAST key points (feature points) are obtained by:

(1) First a pixel p is selected and its brightness is assumed to be I _p ；

(2) Setting 1 threshold T (e.g., I _p Twenty percent of (c);

(3) Taking the pixel p as the center, sixteen pixels are selected on a circle with a radius of three pixels;

(4) Assuming that the selected circle has a brightness greater than I _p +T or less than I _p -N consecutive points of T, then pixel p is considered as a feature point (N is typically set to twelve, namely FAST-12);

(5) The above four steps are continuously cycled through for all pixels.

In a preferred example, to make the FAST-12 algorithm more efficient, a pre-test operation should be added to quickly eliminate a large portion of pixels that are not corner points, where corner points are extreme points, i.e., points with a particular attribute that is in some way prominent, and where the local curvature is greatest on the curve. ORB adds a description of scaling and rotation for weak points where FAST corner points have no directionality and scaling. And the rotation of the feature is achieved by the gray centroid method (Intensity Centroid) as follows.

(1) In small image block B, the moment of the image block is defined as:

(2) Next, the centroid is determined using the following formula:

(3) Describing the direction vector OC (i.e., the geometric center point O and the centroid C may be connected in an image block, where the image block is a rectangle, and the intersection of the diagonals is the geometric center point), the direction of the feature point may be defined as follows:

the FAST corner points describe the scale and selection, greatly enhancing the robustness of the expression between different images, and the improved FAST is called an Oriented FAST.

While BRIEF is a binary descriptor whose vector consists of a number of 1's and 0's, 0's and 1's encode the size relationship of two pixels (e.g., p and q) near the keypoint: if p is greater than q, the value is 1, whereas if p is smaller than q, the value is 0. If we take 128 such p and q we will eventually get a 128-dimensional vector consisting of 0, 1.

In the motion estimation part, the landmark position y=y ₁ ，...，y _n By means of sensor inputs, predicting the state x of the robot _i And x _j And a movement therebetween. The visualization function simplifies the data association of landmarks, their appearance is quantified by feature descriptors, and similarity measures for these descriptors can be defined. And then matching the keypoint descriptor pair (d) by computing the distance in the descriptor space _i ，d _j ). However, the distances themselves are not the distances between the corresponding descriptors as the associated criteria and may vary significantly. Thus, embodiments of the present disclosure utilize a distance from the nearest neighbor (d _n1 ) And to the second nearest neighbor (d _n2 ) Is a distance space of (a). For SIFT (scale-invariant feature transform) and SURF (Speeded Up Robust Features, accelerated robust features), where SURF is a robust local feature point detection and description algorithm), this is

Where r is the distance in descriptor space, d _i For a key descriptor pair, d _j For a key descriptor pair, d _n1 Distance d is distance from nearest neighbor of key point _n2 Is the distance from the second nearest neighbor to the keypoint. The key points refer to the positions of the feature points in the image.

Assuming that 1 keypoint can only be matched with another keypoint in another image, the distance to the 2 nd nearest neighbor should be much larger. In order to make the nearest neighbor search fast, a library of fast approximate nearest neighbor search FLANN implemented in OpenCV library is used. The choice of feature detectors and descriptors greatly affects the accuracy and runtime performance of the system. An implementation of the OpenCV library is used in the system, which can be selected from a large number of keypoint detectors and feature extractors. For ORB, a Hamming distance is used. Since the distance between itself is not an associative criterion, because the distance of matching descriptors can vary greatly. Learning the mapping of the rejection threshold is often not feasible due to the high dimensionality of the feature space.

In each recursive step, the threshold of the internal assay is lowered. A minimum matching feature number threshold for effective estimation is combined. As map area increases, the indoor environment presents additional challenges, and the artificial environment typically contains repeating structures, such as: the same type of chair, window or repeated wallpaper. By such an identical instance, given enough similar features, the corresponding feature matching between the two images results in an estimated impersonation transformation. The threshold for the minimum number of matches reduces the false estimate count for subjects with poor random similarity and repeatability.

Setting the threshold high enough to exclude systematically erroneously associated estimates results in performance loss without mentioning ambiguity. Therefore, the proposed alternative robustness measures are very advantageous for challenging scenarios. To account for the strong anisotropic uncertainty of the measurements, the transform estimation can be improved by minimizing the squared mahalanobis distance instead of the squared euclidean distance, known as two-frame sparse beam adjustment.

Successful transition estimation for earlier frames (i.e., closed loops) can greatly reduce accumulated error. To find a large closed loop, a frame may be randomly sampled from a set of designated key frames. The set of key frames is initialized with the first frame. Any new frame that cannot be matched to the nearest key frame will be added to the set as a key frame. In this way, the number of frames used for sampling is greatly reduced, while the field of view covered by the key frame contains a large portion of the perceived area.

Example 4

Referring to fig. 4, fig. 4 is a schematic structural diagram of a navigation system based on a real-time sparse visual odometer according to an embodiment of the application, wherein the real-time sparse visual odometer is implemented by a hardware implementation platform.

The navigation system includes a binocular camera 45 and an IMU sensor 46 and a real-time sparse visual odometer 50.

The hardware platform of the real-time sparse vision odometer 50 integrates two ARM A9 dual-core CPUs 40-1 and 40-2, a 25K programmable logic unit 41 and an 85K programmable logic unit 42, and has the functions of hardware programming and software programming. The hardware platform is provided with a special camera interface 47 which can be connected with the binocular camera 45, and a special sensor interface 48 which can be connected with the IMU sensor 46, so that the real-time coefficient visual odometer can greatly accelerate the operation rate of the algorithm. In addition, the platform has a FLASH output interface 43 and other various high-speed output interfaces 44, and can directly transmit the output result to other platforms.

The hardware platform has the main characteristics that:

(1) Core board integrated power management: the base plate is powered from the core plate, so that a base plate power supply chip is saved, and the design cost of base plate hardware is reduced.

(2) Core plate + base plate design: the design is flexible, the user designs the functional bottom plate based on the core plate, the design difficulty of the bottom plate hardware is simplified, the design is suitable for project application, and secondary development is convenient.

(3) Compact size: the design of smaller functional bottom plates is facilitated.

(4) And (3) enriching resources:

high performance interface: four camera interfaces, one HDMI, one gigabit network port, one SD interface, one USB-232 interface, one USB-OTG interface and two FEP interfaces;

GPIO/differential pair: the 7010/mini7010 core board can use 102 IO/48 differential pairs (wherein the PS end is 2 IO, and the PL end is 100 IO/48 differential pairs), and the 7020 core board is provided with 127 IO/60 differential pairs (wherein the PS end is 2 IO, and the PL end is 125 IO/60 differential pairs). The backplane FEPX2 interface has 48 GPIO/24 pair differential pairs.

FEP interface: the high-speed communication interface can be externally connected with a sub-card to realize function expansion.

Abundant DEMO: image acquisition, HLS image algorithm design, binocular/four-way camera splicing and subtitle superposition display; linux development; gigabit network communications and the like.

Wherein, a XILINX ZYNQ programmable FPGA chip is carried on a core board of the hardware platform, one core board uses a master chip model ZYNQXC7Z020CLG400-2I, the other core board uses a master chip model 7Z010CLG400-1C, and the other core board uses a master chip model 7Z010CLG400-1C. XC7Z010-CLG400-1C integrates ARM A9 dual-core CPU and 25K programmable logic units. And simultaneously has the functions of hardware programming and software programming. XC7Z020-CLG400-2I integrates ARM A9 dual-core CPU and 85K programmable logic units, and has hardware programming and software programming functions.

The core board has a piece of 4bit SPI FLASH. FLASH may be used to save data and code, initializing the PL and PS part subsystems. The main technical parameters are as follows:

·128Mbit

x1, x2, and x4 support

Up to 400Mbs in highest clock 104MHz,MZ7XA rates@100MHz 4bit mode

Operate at 3.3V

The platform board carries one path of HDMI interface, and the HDMI part adopts IO analog HDMI signals. The output may be transmitted to a high definition of 1080P@60HZ, and the input may be transmitted to 720P@60HZ, preferably using an HDMI daughter card as the input.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

Example 5

The embodiment of the application also provides a storage medium. Alternatively, in this embodiment, the storage medium may implement the method in the embodiment corresponding to fig. 3.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A real-time sparse visual odometer, comprising:

the visual front end is configured to extract key frames and feature points based on the inter-frame images read by the camera and the inter-frame IMU read by the inertial sensor, add corresponding new data to the map based on the key frames and the feature points, and update the map;

a visual backend configured to, upon detecting that a map is updated, remove incoherent data in the map to preserve the scale of the sparse map;

the visual front end further comprises a new feature adding module, wherein the new feature adding module is configured to supplement new feature points to match left and right binocular cameras and triangulate when the effect of extracting the feature points does not reach the expected effect, add new pose information to the map and trigger loop optimization of the visual rear end;

wherein, to find a large closed loop, the visual front-end is further configured to randomly sample a frame from a set of designated key frames, wherein the set of designated key frames is initialized with the first frame, and any new frames that cannot match the nearest key frame are added as key frames to the set of key frames.

2. The real-time sparse visual odometer of claim 1, wherein the visual front end comprises:

an optical flow calculation module configured to calculate a pose of an inter-frame image based on optical flow tracking results of the inter-frame image and a previous frame image read by a camera;

and the pose fusion and estimation module is configured to perform pose estimation of the inertial sensor based on the frame-to-frame state read by the inertial sensor, and fuse the pose of the inter-frame image with the pose estimation of the inertial sensor so as to extract the key frame.

3. The real-time sparse visual odometer of claim 2, wherein the visual front end further comprises:

a key frame extraction module configured to extract the key frame based on the pose of the interframe image and the pose estimation of the inertial sensor after fusion;

and a feature extraction module configured to extract the feature points based on the extracted key frames.

4. The real-time sparse visual odometer of claim 3, wherein the feature extraction module is further configured to:

selecting a pixel P on the inter-frame image and obtaining the brightness I of the pixel P _p ；

Selecting a plurality of pixels on a circle with a radius of a preset value by taking the pixel P as a center;

having a brightness greater than I at the circle _p+T Or less than I _p-T Judging the pixel P as the characteristic point, wherein T is a preset brightness threshold value;

the above process is repeated until all pixels on the inter-frame image have undergone the processing of the above process.

5. The real-time sparse visual odometer of claim 4, wherein the feature extraction module further comprises puncturing out pixels on the inter-frame image that are not corner points prior to selecting a pixel P on the inter-frame image.

6. The real-time sparse visual odometer of claim 3, wherein the feature extraction module is further configured to:

in the image block B of the inter image, the moment of the image block is defined as:

the centroid is determined based on the following formula:

in the image block, a geometric center point O is connected with a centroid C, and the feature point is determined based on the following formula:

7. The real-time sparse visual odometer of claim 1, wherein the visual back-end comprises:

a check update module configured to perform a check update on the key frame and the feature point after the key frame and the feature point are acquired from the vision front end;

the optimizing module is configured to optimize the key frames and the feature points after the inspection and updating, and returns an optimizing result;

a scale control module configured to control the scale of the optimization within a predetermined range such that the scale does not increase over time.

8. A navigation method based on real-time sparse visual odometer, comprising:

extracting key frames and feature points based on the inter-frame images read by the camera and the inter-frame IMU read by the inertial sensor, adding corresponding new data into a map based on the key frames and the feature points, and updating the map;

in the event that an update of the map is detected, irrelevant data in the map is removed to preserve the scale of the sparse map;

navigating based on the map;

when the effect of extracting the characteristic points does not reach the expected value, supplementing new characteristic points to match the left and right binocular cameras, triangulating, adding new pose information into the map, and triggering loop optimization at the rear end of the vision;

wherein, in order to find a large closed loop, a frame is randomly sampled from a set of designated key frames, wherein the set of designated key frames is initialized with the first frame, and any new frames that cannot match the nearest key frame are added as key frames to the set of key frames.

9. A real-time sparse visual odometer-based navigation system, comprising:

an inertial sensor configured to read an interframe IMU;

a binocular camera configured to read the inter-frame images; a real-time sparse visual odometer as claimed in any one of claims 1 to 7.