CN115112123A

CN115112123A - Multi-mobile-robot cooperative positioning method and system based on vision-IMU fusion

Info

Publication number: CN115112123A
Application number: CN202210737648.1A
Authority: CN
Inventors: 和望利; 杜文莉; 钱锋
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-09-27

Abstract

The invention relates to the technical field of sensor fusion and cooperative positioning, in particular to a multi-mobile-robot cooperative positioning method and system based on vision-IMU fusion. The method comprises the following steps: step S1, calibrating parameters of the camera and the IMU; step S2, calculating IMU pre-integration, and aligning data; step S3, extracting ORB characteristics of the image, and performing characteristic matching according to BRIEF descriptors; s4, establishing an optimization problem, and solving to obtain the optimal pose estimation under the local coordinate system of the robot terminal; s5, sending the pose obtained through optimization and the extracted ORB feature data of the image to a server side; step S6, loop detection is carried out by using a bag-of-words model; and step S7, carrying out pose graph optimization under the global unified coordinate, and sending an optimization result to the robot terminal. The invention can accurately and stably output the poses of multiple robots in the global unified coordinate system, and has a promoting significance for the cooperative execution of tasks of multiple mobile robots in a complex and unknown environment.

Description

Multi-mobile-robot cooperative positioning method and system based on vision-IMU fusion

Technical Field

The invention relates to the technical field of sensor fusion and cooperative positioning, in particular to a multi-mobile-robot cooperative positioning method and system based on vision-IMU fusion.

Background

High-precision positioning is one of basic common technologies of an intelligent unmanned system, and plays an important role in the fields of autonomous robot navigation, cooperative control and the like. The robot needs to estimate the position of the robot in the map at each moment in the moving process, namely solving the proposition of 'where me's. In particular, when some complex tasks are faced, multiple robots are required to cooperate with each other to complete the task. Therefore, each robot needs to know its pose and pose relative to other robots, and a unified coordinate system is necessary to locate all robots.

The positioning systems are mainly classified into a positioning system depending on an external device and an autonomous positioning system depending on a self-carried device according to the type of the positioning device. Positioning systems that rely on external devices, such as GPS and motion capture systems, while highly accurate and robust, have limited applicability.

When facing long tunnels, urban low-altitude high-rise forests, indoor scenes and the like, the GPS cannot output high-precision positioning results due to signal loss or the influence of multipath effect.

The motion capture system requires the installation of multiple infrared cameras, the cameras and the field need to be calibrated carefully in advance, and the robot carrying the motion capture system can only move within a predefined area, which greatly limits its application.

Compared with external positioning equipment, sensors carried by the robot body, such as a camera and an IMU (Inertial Measurement Unit), have a wider application range.

The camera can collect images and provide rich visual information. However, camera-based single vision localization methods are prone to failure in the face of complex environments such as changing light, fast moving viewing angles, weak texture structured features.

The IMU can provide three-axis acceleration and three-axis angular acceleration, but measured data contains a large amount of noise, and accumulated errors cannot be eliminated, and the IMU is prone to failure in a large-range long-distance scene.

Meanwhile, the sensors carried by the bodies acquire data under respective coordinate systems, which brings great challenges to the co-location under the unified coordinate.

Disclosure of Invention

The invention aims to provide a multi-mobile-robot cooperative positioning method and system based on vision-IMU fusion, and solves the problem that in the prior art, multi-mobile-robot cooperative positioning is low in accuracy in a complex unknown environment.

In order to achieve the above object, the present invention provides a multi-mobile-robot co-location method based on visual-IMU fusion, comprising the following steps:

step S1, calibrating parameters of the camera and the IMU, respectively acquiring internal parameters of the camera and the IMU, and respectively acquiring external parameters of the camera and the IMU;

step S2, the robot terminal calculates IMU pre-integration according to the IMU measurement data and the motion equation, and performs data alignment;

step S3, the robot terminal extracts ORB features of the image, performs feature matching according to BRIEF descriptors and removes feature mismatching;

s4, the robot terminal establishes an optimization problem according to an observation equation and a motion equation, and the optimization problem is solved to obtain the optimal pose estimation under the local coordinate system of the robot terminal;

s5, the robot terminal sends the pose obtained through optimization and the extracted ORB feature data of the image to a server side;

step S6, the server receives the poses and the characteristic data sent by all the robot ends, loop detection is carried out by using a bag-of-words model, and the relative pose transformation among different loops is calculated;

and S7, the server side optimizes the pose graph under the global unified coordinate to obtain the optimal pose estimation with global consistency, and sends the optimization result to the robot terminal.

In an embodiment, the step S1, further includes the following steps:

s11, acquiring camera image data by using a standard checkerboard calibration board, projecting a three-dimensional space point P of a camera coordinate system to a normalized image plane to form a normalized coordinate of the point P, projecting a point on the normalized coordinate to a pixel plane through camera internal parameters to obtain a position corresponding to the pixel coordinate system, and calculating to obtain the camera internal parameters;

s12, allowing the robot terminal to stand for a designated time, recording IMU measurement data, and calculating according to the IMU measurement data and an IMU error model to obtain IMU internal parameters;

and S13, acquiring camera image data and IMU measurement data by using a standard checkerboard calibration board, and calculating the transformation relation between the camera coordinate system and the IMU coordinate system according to the acquired camera image data and IMU measurement data by combining camera internal parameters and IMU internal parameters.

In an embodiment, the step S2 of performing pre-integration processing on the IMU data is further implemented by the following expression:

wherein, the first and the second end of the pipe are connected with each other,

are respectively the pre-integral terms of position, speed and angle,

respectively linear acceleration and angular acceleration random walk errors,

and

is an internal reference of the IMU, and is a reference of the IMU,

at time t, b _k And rotating parameters of the frame under an IMU coordinate system.

In an embodiment, the step S3, further includes:

step S31, extracting FAST key points and BRIEF descriptors;

step S32, using the difference of M-dimensional vectors of Hamming distance measurement BRIEF descriptors, and using a fast approximate nearest neighbor algorithm to search, and matching the same characteristics in adjacent images;

and step S33, removing characteristic mismatching by adopting a random sampling consistency algorithm.

In an embodiment, the step S4, further includes:

step S41, establishing a least square optimization problem according to an observation equation and a motion equation, wherein the corresponding expression is as follows:

the first term is a residual error term of an IMU (inertial measurement Unit) measured value and a motion equation, the second term is a residual error term of an ORB (object-oriented bounding box) feature and an observation equation which are extracted and matched from an image collected by a camera, and rho (·) is a robust kernel function;

and S42, solving the least square optimization problem in the step S41 by using a Levenberg-Marquardt algorithm, and obtaining the optimal pose estimation under the local coordinate system of the robot terminal.

In an embodiment, the step S5, further includes:

the robot terminal only sends data to the server at a specific moment;

the transmitted data includes: the spatial position of the feature, the corresponding descriptor and the pose at the current moment.

In an embodiment, the step S6, further includes:

the server side sets a local coordinate system of the robot terminal corresponding to the received first frame data as a reference to be a global unified coordinate system;

and the server performs loop detection and relative pose transformation on data subsequently received by all the robot terminals, and maps the poses of all the robot terminals in the local coordinate system to the global unified coordinate system for representation.

In an embodiment, the loop detection in step S6 is implemented by using a bag-of-words model, and further includes:

collecting descriptors, and generating a dictionary by using a K-means clustering algorithm;

the characteristics of each key frame are represented by words and corresponding weights, and the difference of the two frames of data is calculated by comparing the weights corresponding to the two frames of data;

and when the difference of the two frames of data is lower than the threshold value, determining that the data is looped back.

In an embodiment, the step S7, further includes:

after a loop is detected, the server side performs pose graph optimization under a global unified coordinate system, optimizes the poses of all the robot terminals and sends optimization results to the corresponding robot terminals;

the global unified pose graph optimization is realized by the following formula:

the first item is a residual item constructed by the same robot terminal, and the second item is a residual item constructed by different robot terminals according to a loop.

In order to achieve the above object, the present invention provides a multi-mobile-robot co-location system based on visual-IMU fusion, which includes a plurality of robot terminals and a server side:

the plurality of robot terminals are respectively communicated with the server terminal to carry out data interaction,

the plurality of robot terminals and the server terminal are used for realizing the method according to any one of the above items.

The vision-IMU fusion-based multi-mobile-robot cooperative positioning method and system are used for autonomous cooperative positioning of the mobile robots, can accurately and stably output poses of multiple robots in a global unified coordinate system, and have a promoting significance for cooperative task execution of the multiple mobile robots in a complex unknown environment.

Drawings

The above and other features, properties and advantages of the present invention will become more apparent from the following description of the embodiments with reference to the accompanying drawings in which like reference numerals denote like features throughout the several views, wherein:

FIG. 1 discloses a flow chart of a co-location method for multiple mobile robots based on visual-IMU fusion according to an embodiment of the invention;

fig. 2 discloses a robot terminal architecture diagram according to an embodiment of the invention;

FIG. 3 discloses a co-located cooperative system architecture diagram of multiple mobile robots based on vision-IMU fusion according to an embodiment of the present invention;

FIG. 4 discloses a single robot positioning trajectory results diagram according to an embodiment of the invention;

FIG. 5 is a diagram illustrating a result of multi-robot co-location trajectory in accordance with an embodiment of the present invention.

The meanings of the reference symbols in the figures are as follows:

11 a robot terminal;

1n robot terminals;

20 server side.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The vision-IMU fusion-based multi-mobile-robot cooperative positioning method and system provided by the invention are oriented to complex unknown environments, are based on a state estimation paradigm of tight coupling optimization, carry out high-precision multi-robot cooperative positioning through sensor fusion, can accurately and stably output the poses of multiple robots under a unified coordinate system on an embedded platform, and can be applied to the fields of service robots, search and rescue robots, automatic driving and the like.

Fig. 1 discloses a flow chart of a co-location method for multiple mobile robots based on visual-IMU fusion according to an embodiment of the present invention, and as shown in fig. 1, the co-location method for multiple mobile robots based on visual-IMU fusion proposed by the present invention includes the following steps:

step S3, the robot terminal extracts the ORB characteristics of the image, performs characteristic matching according to the BRIEF descriptor and removes characteristic mismatching;

s4, the robot terminal establishes an optimization problem according to the observation equation and the motion equation, and the optimization problem is solved to obtain the optimal pose estimation under the local coordinate system of the robot terminal;

According to the vision-IMU fusion-based multi-mobile-robot cooperative positioning method and system, image features and IMU acceleration data are fused for positioning, the robot terminal sends coordinates under a local coordinate system of the robot terminal and collected feature data to the server side, the server side performs loop detection, relative pose transformation and global pose graph optimization, the coordinates of the robot are mapped to global unified coordinates, the optimized global coordinates are sent to the corresponding robot terminal, advantages of a camera and advantages of an IMU sensor are complemented, and positioning accuracy and robustness of the robot in a complex environment are improved.

These steps will be described in detail below. It is understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features described in detail below (e.g., the embodiments) can be combined with each other and associated with each other to constitute a preferred technical solution.

And step S1, calibrating parameters of the camera and the IMU, respectively acquiring internal parameters of the camera and the IMU, and respectively acquiring external parameters of the camera and the IMU.

Calibrating parameters of the RGB camera and the IMU, and acquiring respective internal parameters of the camera and the IMU and external parameters of the camera and the IMU, namely coordinate transformation relation between the camera and the IMU.

And calibrating the camera internal parameters by adopting a Kalibr camera calibration tool with an open source in the ROS, and calibrating the IMU internal parameters by using an IMU _ utils tool.

Further, the step S1 includes the steps of:

step S11, using a standard checkerboard to collect camera image data, projecting a three-dimensional space point P (X, Y, Z) of a camera coordinate system to a normalized image plane to form a normalized coordinate [ X, Y ] of the point P, projecting the point on the normalized coordinate to a pixel plane through camera intrinsic parameters to obtain a position [ u, v ] corresponding to the pixel coordinate system, and calculating to obtain the camera intrinsic parameters.

The corresponding expression is as follows:

wherein f is _x 、f _y 、c _x 、c _y Is the camera internal reference.

And S12, standing the robot carrying the IMU for a specified time, recording IMU measurement data, and calculating to obtain IMU internal parameters according to the IMU measurement data and the IMU error model.

The corresponding expression is as follows:

wherein the content of the first and second substances,

random walk errors of linear and angular acceleration, respectively, n _a 、n _b Linear acceleration gaussian white noise and angular acceleration gaussian white noise respectively,

and the rotation parameters from the world coordinate system to the IMU coordinate system at the moment t.

S13, collecting camera image data and IMU measurement data by using a standard checkerboard calibration board, and calculating to obtain a transformation relation between a camera coordinate system and an IMU coordinate system according to the collected camera image data and IMU measurement data by combining the camera internal parameters and the IMU internal parameters obtained in the S11 and the S12

And step S2, the robot terminal calculates IMU pre-integration according to the IMU measurement data and the motion equation, and performs data alignment.

Fig. 2 discloses a robot terminal architecture diagram according to an embodiment of the present invention, as shown in fig. 2, since the acquisition frequency of the IMU data is much higher than that of the camera, and the timestamps of the two sensors are not consistent, meanwhile, in order to reduce the amount of calculation in the subsequent optimization process, the IMU data is pre-integrated to align the image of the camera with the IMU data, which is implemented by the following formula:

wherein the content of the first and second substances,

are respectively the pre-integral terms of position, speed and angle,

respectively linear acceleration and angular acceleration random walk errors,

and

is an internal reference of the IMU, and is a reference of the IMU,

And step S3, the robot terminal extracts ORB features of the image, performs feature matching according to BRIEF descriptors and removes feature mismatching.

The robot terminal extracts ORB features of the image, performs feature matching according to BRIEF descriptors, and removes feature mismatching by using a random sample consensus (RANSAC) algorithm.

As shown in fig. 2, the step S3 further includes:

step S31, extracting FAST key points and BRIEF descriptors;

the extracted ORB features consist of FAST key points and BRIEF descriptors.

The FAST key point extraction process comprises the following steps:

first, a pixel point p in the selected image is recorded with its brightness I, 16 pixel points on a circle with a radius of 3 are selected with the pixel point p as the center, and a threshold T (0.2I in this embodiment) is set.

If there are N (12 in this embodiment) points on the selected circle with brightness greater than I + T or less than I-T, the point is considered to be the key point.

The extraction process of the BRIEF descriptor is as follows:

according to predefined M (128 in this embodiment) relatively random positions, if the luminance of the pixel point at the relatively random position is higher than the FAST key point, the value is 1, otherwise, the value is 0, and an M-dimensional vector is formed.

It should be emphasized that the threshold T and N points can be set according to the requirement, and the dimension M of the BRIEF descriptor vector is also set according to the requirement.

The random sample consensus (RANSAC) algorithm is a simple and effective method for removing noise, initially estimates model parameters by using as few points as possible, then continuously iterates and expands the influence range of the obtained model parameters, and data which do not meet the model at the end of iteration are regarded as noise. In this embodiment, a random sample consensus algorithm is used to remove mismatches.

The process of removing the mismatching by the random sample consensus (RANSAC) algorithm is as follows:

selecting n correct matching features from all the matching features, and fitting the n matching features to obtain a transformation relation between two frames of images;

and for the remaining matched features, calculating the distance from each matched feature to the transformation relation, considering that the distance exceeds the threshold value as a mismatching, considering that the distance does not exceed the threshold value as a correct matching, and repeating the process.

And S4, the robot terminal establishes an optimization problem according to the observation equation and the motion equation, and the optimization problem is solved to obtain the optimal pose estimation under the local coordinate system of the robot terminal.

The robot terminal establishes an optimization problem according to an observation equation and a motion equation, and solves the optimization problem by using a Levenberg-Marquard algorithm to obtain optimal pose estimation under respective local coordinate systems.

As shown in fig. 2, the step S4 further includes:

step S41, establishing a least square optimization problem according to the observation equation and the motion equation, wherein the formula expression form is as follows:

the first term is a residual term of an IMU (inertial measurement Unit) measured value and a motion equation, the second term is a residual term of an ORB (object-oriented features) feature and an observation equation which are extracted and matched from an image collected by a camera, and rho (·) is a robust kernel function;

ρ (-) is a robust kernel function for suppressing outlier noise term, and is implemented by the following formula:

and s is a residual error term of the image acquired by the camera, the ORB feature matched with the image is extracted, and the observation equation is obtained.

And step S42, solving the optimization problem in S41 by using Levenberg-Marquard (Levenberg-Marquard) algorithm, and obtaining the optimal pose estimation under respective local coordinate systems.

The levenberg-marquardt algorithm is an optimization algorithm that, given initial values of the optimization variables, seeks an increment for each iteration such that the objective function reaches a minimum value, and controls the iteration step size by the confidence region, stopping the iteration when the found increment is sufficiently small.

In this embodiment, the levenberg-marquardt algorithm is used to solve the optimization problem in step S41, and obtain optimal pose estimates in the respective local coordinate systems of the robot terminals.

And S5, the robot terminal sends the pose obtained by optimization and the extracted ORB feature data of the image to a server side.

The robot terminal is configured as shown in fig. 2, and the robot terminal sends the estimated pose and the extracted image ORB features to the server side.

In order to calculate the coordinates of each robot in the global unified coordinate system, each robot terminal needs to send its own information to the server for processing.

Meanwhile, in order to save communication bandwidth, the robot terminal only sends necessary data to the server side at a specific moment.

Specifically, when the absolute distance between the current position of the robot terminal and the position of the last data transmission exceeds 10cm, or the time interval between the current position of the robot terminal and the last data transmission exceeds 1 second, the data is transmitted.

The transmitted necessary data includes the spatial position of the feature, the corresponding descriptor and the pose at the current time.

And step S6, the server receives the poses and the characteristic data sent by all the robot ends, loop detection is carried out by using the bag-of-words model, and the relative pose transformation among different loops is calculated.

And the server side uses the bag-of-words model to perform loop detection, and if the loop is detected, the relative pose transformation is calculated.

And the server takes the local coordinate system of the robot terminal corresponding to the received first frame data as a reference and sets the local coordinate system as a global uniform coordinate system.

And for the data subsequently received by other robot terminals, the poses of all the robot terminals are transformed to a global unified coordinate system for representation through a loop detection method and relative pose transformation.

And the server side uses the bag-of-words model to perform loop detection, finds images acquired by the same robot terminal at different moments and through the same local camera by different robot terminals, establishes data association, and calculates the relative pose transformation of the different robots and aligns the two robots with the global coordinate system if the loop between the different robots is detected.

More specifically, after receiving data of different robot terminals, the server detects key frames of different robot terminals in the same scene from different perspectives through loop detection, and calculates a transformation relation between two local coordinate systems according to poses of the robot terminals in respective local coordinate systems and relative transformation of the key frames detected through loop detection, so that coordinates of the robot terminals in the local coordinate systems are mapped to a global unified coordinate system.

In this embodiment, loopback detection is implemented using a bag-of-words model.

The bag-of-words model is to describe the image features with words from the descriptors.

The method for realizing loop detection by using the bag-of-words model specifically comprises the following steps:

firstly, descriptors are collected, a dictionary is generated based on a K-Means clustering algorithm (K-Means), and searching of words in subsequent key frames is accelerated.

In order to accelerate the search of words in the subsequent key frames, a hierarchical clustering mode is adopted, each type of data sample of each layer is divided into K types by using a K-Means algorithm, and a K-ary tree is formed to express a dictionary.

The feature of each key frame is represented by a word and a corresponding weight, and when the difference of the two frames of data is calculated, the corresponding weights are compared, and the method is realized by the following formula:

wherein q is _i ，d _i Respectively describing two frames of data characteristic vectors by using a bag-of-words model;

and when the difference of the two frames of data is lower than the threshold value, judging to return, namely the two frames of image data observe the same place.

The K-Means algorithm is a common clustering algorithm in unsupervised machine learning, data samples are classified into K classes, K central points are selected at random initially, the distance between each sample and each central point is calculated, the minimum central point is taken as the classification of each sample, then the central point of each class is recalculated, and the algorithm is converged and exits after iteration is carried out until the change of each central point is small.

After a loop is detected, the server side performs pose graph optimization under a global unified coordinate system, optimizes the poses of all the robot terminals, eliminates accumulated errors existing in all the robot terminals, and sends the obtained optimization results to the corresponding robot terminals;

the global pose graph optimization is realized by the following formula:

Fig. 3 discloses an architecture diagram of a multi-mobile-robot cooperative positioning system based on vision-IMU fusion according to an embodiment of the present invention, and as shown in fig. 3, the multi-mobile-robot cooperative positioning system based on vision-IMU fusion employs a processor with rich computing resources as a server 20 and a robot terminal 11.

In the embodiment shown in fig. 3, the adopted architecture is a centralized architecture, that is, each robot terminal 11, the robot terminal 1n needs to communicate with the server 20 for data interaction.

The robot terminal 11, and the server 20 are used to implement the multi-mobile-robot co-location coordination method based on the visual-IMU fusion as shown in fig. 1 to 2.

Each robot terminal comprises a vision inertial navigation odometer used for estimating the pose of the body under the local coordinate system of the robot terminal. As shown in fig. 3, the robot terminal 11 includes a camera 111, an IMU112, and a visual inertial navigation odometer 113, and the robot terminal 1n includes a camera 1n1, an IMU1n2, and a visual inertial navigation odometer 1n 3.

Each robot terminal sends the coordinates and the extracted image features in the respective local coordinate system to the server 20.

The server 20 receives the data sent by different robot terminals, performs loop detection, relative pose transformation and global pose graph optimization, and sends the optimized result to the corresponding robot terminal.

Fig. 4 discloses a result diagram of a positioning trajectory of a robot terminal according to an embodiment of the present invention, and as shown in fig. 4, experimental results obtained through tests in an actual complex scene indicate that the method and the system for co-positioning a multi-mobile robot based on vision-IMU fusion provided by the present invention have high positioning accuracy and robustness.

Fig. 5 discloses a multi-robot co-location trajectory result diagram according to an embodiment of the invention, and experimental results show that the multi-mobile robot co-location method and system based on vision-IMU fusion provided by the invention have good performance in an actual co-location scene.

The vision-IMU fusion-based multi-mobile-robot cooperative positioning method and system provided by the invention are used for autonomous cooperative positioning of the mobile robot, can accurately and stably output the poses of multiple robots in a global unified coordinate system, and have a promoting significance for cooperative task execution of the multiple mobile robots in a complex unknown environment.

While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood by one skilled in the art.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

The embodiments described above are provided to enable persons skilled in the art to make or use the invention and that modifications or variations can be made to the embodiments described above by persons skilled in the art without departing from the inventive concept of the present invention, so that the scope of protection of the present invention is not limited by the embodiments described above but should be accorded the widest scope consistent with the innovative features set forth in the claims.

Claims

1. A multi-mobile-robot co-location method based on vision-IMU fusion is characterized by comprising the following steps:

2. The vision-IMU fusion based multi-mobile-robot co-location method of claim 1, wherein the step S1 further comprises the steps of:

3. The vision-IMU fusion-based multi-mobile-robot co-location method of claim 1, wherein the step S2 is implemented by pre-integrating IMU data according to the following expression:

wherein the content of the first and second substances,

are respectively the pre-integral terms of position, speed and angle,

respectively linear acceleration and angular acceleration random walk errors,

and

is an internal reference of the IMU, and is a reference of the IMU,

4. The vision-IMU fusion based multi-mobile-robot co-location method of claim 1, wherein the step S3 further comprises:

step S31, extracting FAST key points and BRIEF descriptors;

5. The vision-IMU fusion based multi-mobile-robot co-location method of claim 1, wherein the step S4 further comprises:

6. The vision-IMU fusion based multi-mobile-robot co-location method of claim 1, wherein the step S5 further comprises:

the robot terminal only sends data to the server at a specific moment;

7. The vision-IMU fusion based multi-mobile-robot co-location method of claim 1, wherein the step S6 further comprises:

8. The vision-IMU fusion-based multi-mobile-robot co-location method of claim 1, wherein the loop detection in step S6 is implemented by using a bag-of-words model, and further comprising:

9. The vision-IMU fusion based multi-mobile-robot co-location method of claim 1, wherein the step S7 further comprises:

10. A multi-mobile-robot cooperative positioning system based on vision-IMU fusion is characterized by comprising a plurality of robot terminals and a server side:

the plurality of robot terminals and server side for implementing the method according to any of claims 1-9.