CN112016612A - Monocular depth estimation-based multi-sensor fusion SLAM method - Google Patents

Monocular depth estimation-based multi-sensor fusion SLAM method Download PDF

Info

Publication number
CN112016612A
CN112016612A CN202010871686.7A CN202010871686A CN112016612A CN 112016612 A CN112016612 A CN 112016612A CN 202010871686 A CN202010871686 A CN 202010871686A CN 112016612 A CN112016612 A CN 112016612A
Authority
CN
China
Prior art keywords
map
loss function
image
representing
parallax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010871686.7A
Other languages
Chinese (zh)
Inventor
彭倍
刘帅
邵继业
曾双耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Artigent Robotics Equipment Co ltd
Original Assignee
Sichuan Artigent Robotics Equipment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Artigent Robotics Equipment Co ltd filed Critical Sichuan Artigent Robotics Equipment Co ltd
Priority to CN202010871686.7A priority Critical patent/CN112016612A/en
Publication of CN112016612A publication Critical patent/CN112016612A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C22/00Measuring distance traversed on the ground by vehicles, persons, animals or other moving solid bodies, e.g. using odometers, using pedometers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S11/00Systems for determining distance or velocity not using reflection or reradiation
    • G01S11/12Systems for determining distance or velocity not using reflection or reradiation using electromagnetic waves other than radio waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/93Lidar systems specially adapted for specific applications for anti-collision purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Electromagnetism (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a monocular depth estimation-based multi-sensor fusion SLAM method, which comprises the following steps: firstly, acquiring an RGB image, laser point cloud information and mileage information of a current environment, and performing monocular depth estimation on the RGB image to acquire corresponding depth information; secondly, synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in an ROS system; thirdly, establishing a new node according to the information transmitted in each topic after synchronization to obtain a local grid map; then, carrying out loop detection on the local grid map; and finally, fusing the local grid maps after loop detection to create a global map. The method and the device for creating the global map by using the laser radar and the sensor combination of the pseudo RGBD camera obtained through monocular depth estimation reduce the economic cost and the size of directly using the RGBD camera or the combination of the binocular camera and the laser radar sensor.

Description

Monocular depth estimation-based multi-sensor fusion SLAM method
Technical Field
The application relates to the technical field of mobile robots, in particular to a multi-sensor fusion SLAM method based on monocular depth estimation.
Background
The multi-sensor fusion is one of the future development directions in robot positioning and navigation, and in the multi-sensor fusion, the fusion of laser radar and visual information is an important part. In the existing modes, visual images are obtained through an RGBD camera, a binocular camera and the like. However, the RGBD camera and the binocular camera which can directly obtain the depth information have high cost and large volume, and for a low-cost robot or a small-sized robot, there are not little problems in terms of economic cost and structural design. Although monocular cameras are small in size and low in cost, depth information cannot be directly acquired. It is therefore desirable to provide a solution to this problem.
Disclosure of Invention
The application aims to provide a monocular depth estimation-based multi-sensor fusion SLAM method, which is used for achieving the technical effects of obtaining depth information of surrounding environment through a monocular sensor and reducing cost and equipment size.
The embodiment of the application provides a multi-sensor fusion SLAM method based on monocular depth estimation, which comprises the following steps:
step 1, acquiring an RGB image, laser point cloud information and mileage information of a current environment, and performing monocular depth estimation on the RGB image to acquire corresponding depth information;
step 2, synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in an ROS system;
step 3, establishing new nodes according to the information transmitted in each synchronized topic to obtain a local grid map;
step 4, carrying out loop detection on the local grid map;
and 5, fusing the local grid maps subjected to loop detection to create a global map.
The beneficial effect that this application can realize is: the method and the device for creating the global map by using the laser radar and the sensor combination of the pseudo RGBD camera obtained through monocular depth estimation reduce the economic cost and the size of directly using the RGBD camera or the combination of the binocular camera and the laser radar sensor.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a multi-sensor fusion SLAM method based on monocular depth estimation according to an embodiment of the present disclosure;
fig. 2 is a schematic view of a monocular depth estimation process according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a new node creation process provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of the results of an analysis comparison experiment provided in the examples of the present application;
fig. 5 is a schematic diagram of a loopback detection result provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a multi-sensor fusion SLAM method based on monocular depth estimation according to an embodiment of the present disclosure.
The embodiment of the application provides a multi-sensor fusion SLAM method based on monocular depth estimation, which comprises the following steps:
step 1, acquiring an RGB image, laser point cloud information and mileage information of a current environment, and performing monocular depth estimation on the RGB image to acquire corresponding depth information;
step 2, synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in an ROS system;
step 3, establishing new nodes according to the information transmitted in each synchronized topic to obtain a local grid map;
step 4, carrying out loop detection on the local grid map;
and 5, fusing the local grid maps subjected to loop detection to create a global map.
In one embodiment, the RGB image of the current environment can be acquired through the monocular sensor, the depth information of the RGB image is analyzed, and then the RGB image is transmitted to the ROS system; acquiring laser point cloud information of the current environment through a laser radar and transmitting the laser point cloud information to an ROS system; and acquiring the mileage information of the robot through an encoder and transmitting the mileage information to the ROS system. The RGB image may be directly obtained by the monocular sensor, and the depth information (including depth information of each pixel point in the depth image) matched with the RGB image obtained by the monocular sensor may be obtained by learning through the CNN neural network.
Considering that different sensors transmit in the ROS system through respective topics, and the frequency of sending topics to the outside by each sensor is different, it is necessary to use a topic synchronization scheme with a certain policy to perform corresponding synchronization on sensor topics. When topic synchronization is performed through the step 2, the sets output firstly should not have intersection. Assuming that there are two sets M and N, and i is a certain moment in the topic, then for all i, these two sets should either satisfy Mi ≦ Ni, or satisfy Ni ≦ Mi. Secondly, there is at least one topic with no other messages discarded between the two groups of messages, i.e. the set should be continuous; the set of outgoing topics should depend on the time stamp of the ROS, not the actual arrival time of the message.
Please refer to fig. 2, fig. 2 is a schematic view illustrating a monocular depth estimation process according to an embodiment of the present application.
Monocular depth estimation is to estimate the depth information of each pixel point in a monocular image input to a function through the function. Such a function can be briefly described as:
Figure BDA0002651284730000041
wherein the content of the first and second substances,
Figure BDA0002651284730000042
i is the depth information of the pixel point, I is the input image. The training step of the function comprises:
firstly, acquiring a real left image and a real right image of a surrounding environment;
secondly, estimating a right disparity map and a left disparity map according to the real left map through a CNN neural network;
and finally, overlapping the right disparity map and the real left map to obtain a right map estimation map, and overlapping the left disparity map and the real right map to obtain a left map estimation map.
In the above process, the loss functions of the right and left image estimation maps in the estimation process can be analyzed, and the loss functions are reduced to expected values by adjusting the parameters of the CNN neural network.
In particular, the loss function comprises an image reconstruction loss functionNumber CapParallax smoothing loss function CdsLeft and right consistency loss function Clr. Specifically, taking the real left image as an example, the loss function C of image reconstruction thereofapThe calculation method is as follows:
Figure BDA0002651284730000043
wherein the content of the first and second substances,
Figure BDA0002651284730000044
representing a first image reconstruction loss function; n represents the number of pixel points;
Figure BDA0002651284730000045
representing a true left graph;
Figure BDA0002651284730000046
representing a left graph estimation graph; α represents a weight. In one embodiment, a single scale SSIM (structural similarity index) and simplified 3 x 3 filtering may be employed, with α taken to be 0.82. The same way can obtain a second image reconstruction loss function Cr ap
Parallax smoothing loss function CdsThe calculation method is as follows:
Figure BDA0002651284730000047
wherein, Cl dsRepresenting a first disparity smoothing loss function;
Figure BDA0002651284730000051
representing the parallax gradient of the left parallax map in the x direction;
Figure BDA0002651284730000052
representing the image gradient of the true left image in the x direction;
Figure BDA0002651284730000053
representing the parallax gradient of the left parallax map in the y direction;
Figure BDA0002651284730000054
representing the image gradient in the y-direction of the true left image. The same can obtain the second parallax smooth loss function Cr ds
In practical situations, the left-to-right parallax of the binocular camera and the right-to-left parallax are completely equivalent, the CNN neural network only receives a left image as input and simultaneously predicts a left parallax image and a right parallax image, in order to ensure coordination, a penalty for left-right parallax inconsistency is added into a loss function, the left parallax image is forced to be equal to a projected right parallax image, the prediction accuracy of the network on the parallax images can be effectively improved by adding strong prior supervision, and a left-right consistency loss function C is addedlrThe calculation method is as follows:
Figure BDA0002651284730000055
wherein, Cl lrRepresenting a first left-right consistency loss function;
Figure BDA0002651284730000056
representing a left disparity map;
Figure BDA0002651284730000057
a reconstructed disparity map representing the left disparity map. The same can obtain a second left-right consistency loss function Cr lr
The three loss functions are weighted to obtain the total loss function:
Figure BDA0002651284730000058
in the formula, CsRepresenting the total loss function, αapA weight representing an image reconstruction loss function; alpha is alphadsWeights representing a disparity smoothing loss function; alpha is alphalrWeights representing left and right consistency loss functionsAnd (4) heavy. In one embodiment, αapMay be 0.9; alpha is alphadsCan be 1; alpha is alphalrMay be 0.01.
As shown in fig. 3, fig. 3 is a schematic diagram of a new node creation process provided in the embodiment of the present application.
In one embodiment, the new node may be obtained by a visual odometer, and specifically, the establishment of the new node is performed in sequence according to the following procedures:
first, feature detection is performed. Specifically, a frame of image is acquired, and the GFTT feature is used to perform feature detection on the frame of image, thereby acquiring a feature map.
Secondly, feature matching is performed. Specifically, through NNDR test, extracting the features in the feature map by using binary robust independent basic feature descriptors; the feature map contains 3D features with descriptors from the last key frame.
Again, motion estimation is performed. Specifically, a PnP algorithm is used to calculate the feature transformation of the current frame corresponding to the feature map, and the implementation manner is as follows:
λivi=RPi+t (1)
wherein λ isiFor 2D reference points v in the camera coordinate systemiDepth of (v) ofiSatisfy | | v i1, R is a rotation matrix, and t is a translation vector; piIs a spatial 3D reference point in the world coordinate system.
From time to time, local BA optimization was performed. Specifically, the displacement result is refined by using the features of all key frames in the feature map.
And then, updating the pose. Specifically, the output odometer and the coordinate transformation are updated using the feature transformation of the motion estimation; namely, updating the conversion of the output odometer and the ROS system tf tree from an "from" coordinate system to a "base _ link" coordinate system; adopting a Median Absolute Deviation (MAD) method of 3D characteristics when the covariance is calculated;
finally, motion prediction is performed. Specifically, according to the result of pose updating, motion prediction is carried out on the position of the feature map in the current frame, and then feature matching is carried out again; specifically, after the pose is updated, the position of the feature of the motion model feature map in the current frame can be used for motion prediction, and then feature matching is performed again.
In one embodiment, the present application uses an improved PnP algorithm for motion estimation, which is as follows: set space point P under world coordinate systemiProjection in the image coordinate system is Pi'=[x',y']TSince there is some error and uncertainty in measuring the location of the projected point, to express this uncertainty, a two-dimensional covariance matrix can be used (where P' corresponds to P)i'):
Figure BDA0002651284730000061
In order to introduce such uncertainty into the whole calculation process, a two-dimensional covariance matrix can also be used to express the normalization coordinate process and the transformation process of the space vector of the coordinate point in the camera coordinate system. The uncertainty of the reference point is finally expressed as a matrix C:
Figure BDA0002651284730000071
in the matrix
Figure BDA0002651284730000072
Is a matrix sigmap'p'And (5) normalizing the matrix.
Writing equation (1) in the form of a homogeneous system of linear equations:
Bu=0
where B is the coefficient of the homogeneous system of equations, and u is the column vector:
Figure BDA0002651284730000073
the uncertainty of the reference point described above is added to the resulting homogeneous system of linear equations:
BTCBu=Nu=0
wherein u satisfies the constraint condition | | | u | | | ═ 1. And carrying out singular value decomposition on the coefficient matrix in the formula.
N=UDVT
Wherein, U, D, VTRepresenting the singular values after singular value decomposition.
Obtaining a characteristic vector through singular value decomposition of the formula, and obtaining a rotation matrix and a translation vector according to the characteristic vector:
Figure BDA0002651284730000074
wherein the content of the first and second substances,
Figure BDA0002651284730000075
only the translation direction is reflected, and distance information is not available, so that the distance information is added:
Figure BDA0002651284730000076
in the formula (I), the compound is shown in the specification,
Figure BDA0002651284730000081
to pair
Figure BDA0002651284730000082
Singular value decomposition is carried out to obtain a rotation matrix:
Figure BDA0002651284730000083
R=URVR T
and finally, obtaining initial values of a rotation matrix and a translation vector of the camera, constructing an error function, and iteratively searching for an optimal solution by adopting a Gauss-Newton method.
In one embodiment, a new node is established according to the information transmitted in each topic after synchronization, and the step of obtaining the local grid map further includes: and analyzing whether the inner layer number calculated in the motion estimation process is lower than a set fixed threshold, and if the inner layer number is lower than the set fixed threshold, updating the key frames in the feature matching and local BA optimization.
Specifically, if the number of intra layers calculated in the motion estimation process is below a fixed threshold, the keyframes or feature maps are updated. And for the feature map, updating the feature map by adding the matching features of the new frame and updating the positions of the matching features through a local BA optimization module. When the size of the feature map exceeds a fixed threshold, the oldest feature that does not match the current frame is deleted. If a key frame no longer has a feature in the feature map, it will be discarded.
In one embodiment, the process of fusing the local grid maps after the loop detection to create the global map uses a bayesian filtering algorithm to fuse the local grid maps:
Figure BDA0002651284730000084
wherein z represents an observed value of the local map; stateiIs the state of the ith grid; p (state)i|zt+1) And p (state)i|zt) The state distribution of the ith grid at the time t +1 and the time t respectively; p is a radical ofobs(statei|zt+1) Is a constant set. Illustratively, p is the occupied state when the state of the ith grid at time t +1 is the occupied stateobs(statei|zt+1) 0.8; when the status of the ith grid at time t +1 is nullobs(statei|zt+1)=0.3。
Fig. 4 is a schematic diagram showing the results of an analysis and comparison experiment provided in the embodiment of the present application, as shown in fig. 4.
In one embodiment, the application performs algorithmic verification with the freiburg2_ Pioneer _360 dataset of the institute of information and technology computer vision group (TUM) of munich university, which is recorded by a Kinect camera mounted on top of a Pioneer robot, which performs circular motion under the control of a remote control handle, and the dataset also includes lidar data and odometry data of the robot, and the experiment uses only RGB images of the Kinect camera.
Firstly, data playback is carried out in an ROSbag system by using a ROSbag format file of a data set, an RGB image of a Kinect camera, laser point cloud information of a laser radar and mileage information of a mileometer are input into the ROS system, a monocular depth estimation module estimates a depth image sequence from the obtained RGB image sequence and inputs the obtained depth image sequence into the ROS system, and the motion of the camera is obtained through fusion estimation of the obtained image information and the laser radar point cloud information. As can be seen from fig. 4, the cumulative error of the path obtained by using the method of the present application is smaller and closer to the real path.
As shown in fig. 5, fig. 5 is a schematic diagram of a loop detection result provided in the embodiment of the present application.
The path of the robot is still deviated from the actual path along with the movement of the robot, so that the influence of accumulated errors on the path of the robot needs to be reduced by using a loop detection module. After the loop detection is carried out, the robot path is adjusted again, and a more accurate final result is obtained.
In one embodiment, a word-bag method may be used when performing loop detection. When a new node is created, the program extracts visual characteristics from the RGB image (the same GFTT features as in building the node can be used) and quantizes it to a delta visual vocabulary. The GFTT characteristics can extract a large amount of characteristic information in a short time, the characteristic diversity is rich, and in order to improve the loop detection efficiency, only the subset of the odometer characteristics with the highest response can quantize visual vocabularies. When a loop back transition has to be computed, other characteristics remain in the node, and the created new node is then compared to the previous node to detect a loop back.
In summary, the present application provides a monocular depth estimation-based multi-sensor fusion SLAM method, including acquiring RGB images, laser point cloud information, and mileage information of a current environment, and performing monocular depth estimation on the RGB images to acquire corresponding depth information; synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in the ROS system; establishing a new node according to the information transmitted in each topic after synchronization to obtain a local grid map; carrying out loop detection on the local grid map: and fusing the local grid maps after the loopback detection to create a global map. The method and the device for creating the global map by using the laser radar and the sensor combination of the pseudo RGBD camera obtained through monocular depth estimation reduce the economic cost and the size of directly using the RGBD camera or the combination of the binocular camera and the laser radar sensor.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (6)

1. A multi-sensor fusion SLAM method based on monocular depth estimation is characterized by comprising the following steps:
step 1, acquiring an RGB image, laser point cloud information and mileage information of a current environment, and performing monocular depth estimation on the RGB image to acquire corresponding depth information;
step 2, synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in an ROS system;
step 3, establishing new nodes according to the information transmitted in each synchronized topic to obtain a local grid map;
step 4, carrying out loop detection on the local grid map;
and 5, fusing the local grid maps subjected to loop detection to create a global map.
2. The method of claim 1, wherein step 1 comprises:
step 11, acquiring a real left image and a real right image of the surrounding environment;
step 12, estimating a right disparity map and a left disparity map according to the real left map through a CNN neural network;
and step 13, overlapping the right disparity map and the real left map to obtain a right map estimation map, and overlapping the left disparity map and the real right map to obtain a left map estimation map.
3. The method of claim 2, wherein step 1 further comprises:
step 14, analyzing the loss functions of the right graph estimation graph and the left graph estimation graph, and adjusting the parameters of the CNN neural network to reduce the total loss function to an expected value; the method comprises the following steps:
analyzing a first image reconstruction loss function of an estimation process from the true left image and the left image estimate; analyzing a second image reconstruction loss function of the estimation process from the true right image and the right image estimation map; wherein, the first image reconstruction loss function expression is:
Figure FDA0002651284720000021
in the formula, Cl apRepresenting the first image reconstruction loss function; n represents the number of pixel points;
Figure FDA0002651284720000022
representing a true left graph;
Figure FDA0002651284720000023
representing a left graph estimation graph; alpha is the weight of the basic reconstruction error and the similarity error, and the second image reconstruction loss function C can be obtained by the same methodr ap
Analyzing a first parallax smooth loss function according to the parallax gradient of the left parallax image and the image gradient of the real left image; analyzing a second parallax smoothing loss function according to the parallax gradient of the right parallax image and the image gradient of the real right image; wherein, the first parallax smooth loss function expression is:
Figure FDA0002651284720000024
wherein, Cl dsRepresenting the first disparity smoothing loss function;
Figure FDA0002651284720000025
representing the parallax gradient of the left parallax map in the x direction;
Figure FDA0002651284720000026
representing the image gradient of the true left image in the x direction;
Figure FDA0002651284720000027
representing the parallax gradient of the left parallax map in the y direction;
Figure FDA0002651284720000028
representing the image gradient of the true left image in the y direction; the second parallax smooth loss function C can be obtained by the same methodr ds
Reconstructing a disparity map according to the left disparity map and the right disparity map, and analyzing a first left-right consistency loss function of the left disparity map and a second left-right consistency loss function of the right disparity map; wherein the first left-right consistency loss function expression is:
Figure FDA0002651284720000029
wherein, Cl lrRepresenting the first left-right consistency loss function;
Figure FDA00026512847200000210
representing a left disparity map;
Figure FDA00026512847200000211
a reconstructed disparity map representing a left disparity map; the second bilateral consistency loss function C can be obtained in the same wayr lr
Weighting the first image reconstruction loss function, the second image reconstruction loss function, the first parallax smooth loss function, the second parallax smooth loss function, the first left-right consistency loss function and the second left-right consistency loss function to obtain a total loss function; wherein, the total loss function expression is:
Figure FDA0002651284720000031
in the formula, CsRepresenting the total loss function, αapA weight representing an image reconstruction loss function; alpha is alphadsWeights representing a disparity smoothing loss function; alpha is alphalrRepresenting the weights of the left and right consistency loss functions.
4. The method of claim 1, wherein step 3 comprises:
step 31, feature detection: acquiring a frame of image, and performing feature detection on the frame of image by using GFTT features to acquire a feature map;
step 32, feature matching: extracting features in the feature map by using a binary robust independent basic feature descriptor through NNDR test; the feature map contains 3D features with descriptors from the last key frame;
step 33, motion estimation: the PnP algorithm is used for calculating the feature transformation of the current frame corresponding to the feature map, and the implementation mode is that
λivi=RPi+t
Wherein λ isiFor 2D reference points v in the camera coordinate systemiDepth of (v) ofiSatisfy | | vi1, R is a rotation matrix, and t is a translation vector; piA spatial 3D reference point under a world coordinate system;
step 34, local BA optimization: refining the displacement result by using the characteristics of all key frames in the characteristic diagram;
step 35, pose updating: updating the output odometer and the coordinate transformation by using the characteristic transformation of the motion estimation;
step 36, motion prediction: and according to the pose updating result, performing motion prediction on the position in the current frame by using the features of the feature map, and then performing the step 32 again.
5. The method of claim 4, wherein step 3 further comprises:
and step 37, analyzing whether the inner layer number calculated in the motion estimation process is lower than a set fixed threshold, and if the inner layer number is lower than the set fixed threshold, updating the key frames in the step 32 and the step 34.
6. The method according to claim 1, wherein said step 5 fuses said local grid map using a bayesian filtering algorithm:
Figure FDA0002651284720000041
wherein z represents an observed value of the local map; stateiIs the state of the ith grid; p (state)i|zt+1) And p (state)i|zt) The state distribution of the ith grid at the time t +1 and the time t respectively; p is a radical ofobs(statei|zt+1) Is a constant set.
CN202010871686.7A 2020-08-26 2020-08-26 Monocular depth estimation-based multi-sensor fusion SLAM method Pending CN112016612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010871686.7A CN112016612A (en) 2020-08-26 2020-08-26 Monocular depth estimation-based multi-sensor fusion SLAM method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010871686.7A CN112016612A (en) 2020-08-26 2020-08-26 Monocular depth estimation-based multi-sensor fusion SLAM method

Publications (1)

Publication Number Publication Date
CN112016612A true CN112016612A (en) 2020-12-01

Family

ID=73503706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010871686.7A Pending CN112016612A (en) 2020-08-26 2020-08-26 Monocular depth estimation-based multi-sensor fusion SLAM method

Country Status (1)

Country Link
CN (1) CN112016612A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446905A (en) * 2021-01-29 2021-03-05 中国科学院自动化研究所 Three-dimensional real-time panoramic monitoring method based on multi-degree-of-freedom sensing association
CN112561824A (en) * 2020-12-21 2021-03-26 中国科学院自动化研究所 Data alignment restoration method and system for fusion of 2D laser and depth image
CN113219492A (en) * 2021-03-30 2021-08-06 苏州市卫航智能技术有限公司 Method and system for positioning and navigating river course ship driving
CN114624688A (en) * 2022-03-15 2022-06-14 电子科技大学 Tracking and positioning method based on multi-sensor combination

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120121161A1 (en) * 2010-09-24 2012-05-17 Evolution Robotics, Inc. Systems and methods for vslam optimization
CN106127739A (en) * 2016-06-16 2016-11-16 华东交通大学 A kind of RGB D SLAM method of combination monocular vision
CN107193279A (en) * 2017-05-09 2017-09-22 复旦大学 Robot localization and map structuring system based on monocular vision and IMU information
US20170337470A1 (en) * 2016-05-20 2017-11-23 Magic Leap, Inc. Method and system for performing convolutional image transformation estimation
EP3451288A1 (en) * 2017-09-04 2019-03-06 Universität Zürich Visual-inertial odometry with an event camera
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network
CN110501017A (en) * 2019-08-12 2019-11-26 华南理工大学 A kind of Mobile Robotics Navigation based on ORB_SLAM2 ground drawing generating method
US20190377952A1 (en) * 2016-06-24 2019-12-12 Robert Bosch Gmbh RGB-D Camera Based Tracking System and Method Thereof
CN111045017A (en) * 2019-12-20 2020-04-21 成都理工大学 Method for constructing transformer substation map of inspection robot by fusing laser and vision
CN111089585A (en) * 2019-12-30 2020-05-01 哈尔滨理工大学 Mapping and positioning method based on sensor information fusion
WO2020111844A2 (en) * 2018-11-28 2020-06-04 서울대학교 산학협력단 Method and apparatus for enhancing image feature point in visual slam by using object label
CN111462207A (en) * 2020-03-30 2020-07-28 重庆邮电大学 RGB-D simultaneous positioning and map creation method integrating direct method and feature method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120121161A1 (en) * 2010-09-24 2012-05-17 Evolution Robotics, Inc. Systems and methods for vslam optimization
US20170337470A1 (en) * 2016-05-20 2017-11-23 Magic Leap, Inc. Method and system for performing convolutional image transformation estimation
CN106127739A (en) * 2016-06-16 2016-11-16 华东交通大学 A kind of RGB D SLAM method of combination monocular vision
US20190377952A1 (en) * 2016-06-24 2019-12-12 Robert Bosch Gmbh RGB-D Camera Based Tracking System and Method Thereof
CN107193279A (en) * 2017-05-09 2017-09-22 复旦大学 Robot localization and map structuring system based on monocular vision and IMU information
EP3451288A1 (en) * 2017-09-04 2019-03-06 Universität Zürich Visual-inertial odometry with an event camera
WO2020111844A2 (en) * 2018-11-28 2020-06-04 서울대학교 산학협력단 Method and apparatus for enhancing image feature point in visual slam by using object label
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network
CN110501017A (en) * 2019-08-12 2019-11-26 华南理工大学 A kind of Mobile Robotics Navigation based on ORB_SLAM2 ground drawing generating method
CN111045017A (en) * 2019-12-20 2020-04-21 成都理工大学 Method for constructing transformer substation map of inspection robot by fusing laser and vision
CN111089585A (en) * 2019-12-30 2020-05-01 哈尔滨理工大学 Mapping and positioning method based on sensor information fusion
CN111462207A (en) * 2020-03-30 2020-07-28 重庆邮电大学 RGB-D simultaneous positioning and map creation method integrating direct method and feature method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
庄严, 王伟, 王珂, 徐晓东: "移动机器人基于激光测距和单目视觉的室内同时定位和地图构建", 自动化学报, no. 06 *
李爽: "基于视觉的室内定位算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 49 - 51 *
王化友;代波;何玉庆;: "CFD-SLAM:融合特征法与直接法的快速鲁棒SLAM系统", 高技术通讯, no. 12 *
王消为;贺利乐;赵涛;: "基于激光雷达与双目视觉的移动机器人SLAM研究", 传感技术学报, no. 03 *
骆燕燕;陈龙;: "融合视觉信息的激光定位与建图", 工业控制计算机, no. 12 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112561824A (en) * 2020-12-21 2021-03-26 中国科学院自动化研究所 Data alignment restoration method and system for fusion of 2D laser and depth image
CN112561824B (en) * 2020-12-21 2023-04-07 中国科学院自动化研究所 Data alignment restoration method and system for fusion of 2D laser and depth image
CN112446905A (en) * 2021-01-29 2021-03-05 中国科学院自动化研究所 Three-dimensional real-time panoramic monitoring method based on multi-degree-of-freedom sensing association
CN112446905B (en) * 2021-01-29 2021-05-11 中国科学院自动化研究所 Three-dimensional real-time panoramic monitoring method based on multi-degree-of-freedom sensing association
CN113219492A (en) * 2021-03-30 2021-08-06 苏州市卫航智能技术有限公司 Method and system for positioning and navigating river course ship driving
CN114624688A (en) * 2022-03-15 2022-06-14 电子科技大学 Tracking and positioning method based on multi-sensor combination
CN114624688B (en) * 2022-03-15 2023-04-07 电子科技大学 Tracking and positioning method based on multi-sensor combination

Similar Documents

Publication Publication Date Title
CN110827415B (en) All-weather unknown environment unmanned autonomous working platform
CN112016612A (en) Monocular depth estimation-based multi-sensor fusion SLAM method
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
CN113811920A (en) Distributed pose estimation
US20130335528A1 (en) Imaging device capable of producing three dimensional representations and methods of use
CN107808407A (en) Unmanned plane vision SLAM methods, unmanned plane and storage medium based on binocular camera
CN110717927A (en) Indoor robot motion estimation method based on deep learning and visual inertial fusion
CN112200165A (en) Model training method, human body posture estimation method, device, equipment and medium
CN110969648B (en) 3D target tracking method and system based on point cloud sequence data
CN110070578B (en) Loop detection method
US11948309B2 (en) Systems and methods for jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator
CN111462324A (en) Online spatiotemporal semantic fusion method and system
KR101869605B1 (en) Three-Dimensional Space Modeling and Data Lightening Method using the Plane Information
CN111753696A (en) Method for sensing scene information, simulation device and robot
Tarrio et al. Realtime edge based visual inertial odometry for MAV teleoperation in indoor environments
CN114627491A (en) Single three-dimensional attitude estimation method based on polar line convergence
CN115900710A (en) Dynamic environment navigation method based on visual information
US11188787B1 (en) End-to-end room layout estimation
CN114898355A (en) Method and system for self-supervised learning of body-to-body movements for autonomous driving
CN114266823A (en) Monocular SLAM method combining SuperPoint network characteristic extraction
Zhu et al. PairCon-SLAM: Distributed, online, and real-time RGBD-SLAM in large scenarios
CN115727854A (en) VSLAM positioning method based on BIM structure information
CN113902995B (en) Multi-mode human behavior recognition method and related equipment
Liao et al. VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping
Demiannay et al. Design and Development of Multirotor UAV with Automatic Collision Avoidance using Stereo Camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination